Notebook test slow tests

dabacon commented 3 years ago

Probably worth trying to speed up that notebook test as it is long poll in the CI. Maybe there is a pattern to add something that can "fix" the notebooks to smaller parameters to speed them up?

63.27s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/cirq/contrib/quimb/Cirq-to-Tensor-Networks.ipynb]
42.70s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/quantum_walks.ipynb]
33.26s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/educators/qaoa_ising.ipynb]
28.50s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/hidden_linear_function.ipynb]
27.31s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/educators/chemistry.ipynb]
24.82s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/qaoa.ipynb]
21.05s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/variational_algorithm.ipynb]
14.91s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/cirq/contrib/quimb/Contract-a-Grid-Circuit.ipynb]
11.20s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/rabi_oscillations.ipynb]
10.77s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/interop.ipynb]
9.48s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/educators/intro.ipynb]
9.44s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/educators/textbook_algorithms.ipynb]
7.76s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/circuits.ipynb]
7.73s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/educators/ion_device.ipynb]
7.45s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/simulation.ipynb]
7.29s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/basics.ipynb]
6.34s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/qubits.ipynb]
6.30s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/transform.ipynb]
6.26s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/educators/neutral_atom.ipynb]
6.21s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/qudits.ipynb]
6.19s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/protocols.ipynb]
6.13s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/custom_gates.ipynb]
6.04s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/noise.ipynb]
6.03s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/tutorials/shor.ipynb]
5.93s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/cirq/contrib/svg/example.ipynb]
5.84s call     dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/docs/gates.ipynb]
0.11s setup    dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/cirq/contrib/quimb/Contract-a-Grid-Circuit.ipynb]
0.06s setup    dev_tools/notebook_test.py::test_notebooks[/Users/bacon/code/github/Cirq/cirq/contrib/svg/example.ipynb]

balopat commented 3 years ago

Haha, I was just about to open this issue!

There is a lot of room for improvement.

A bit of context: notebooks are expected to ran in two ways: 1) in colab 2) in jupyter notebook from the Cirq repo In 1) the path currently requires the latest stable version of cirq to be installed (currently 0.9.1). This means that we need to test the notebooks as they change against the latest release. Currently the notebook test does this. That's why they are slow: each test creates a separate virtual env and runs the pip install cirq command, and then executes the notebook. This is also how the devsite pipeline tests them, so it is useful to have this test on the PRs. However, we don't have to run these "expensive", "isolated" tests on every PR for every notebook, as the code does not change (we are testing against the 0.9.1 currently!). Action item: we can make the "isolated" tests only for changed notebooks.

For 2) we do not have tests currently. But those do not need to be in an isolated virtual env. They can share a one-time setup and we can just run through them. Action item: create a full notebook test to test against master. Note: While this is technically possible, it will ensure backwards compatible changes between consecutive releases (as the same notebook have to work for master as well as the last stable release).

Finally, these will be always on the slower side. We can and should resort to xdist based execution within a job and sharding of the workloads to multiple jobs. E.g. here's a PoC that uses treebeard for the sharded 1) case

One thing you can do currently locally:

check/pytest -n auto -m slow dev_tools/notebook_test.py - which will run them in parallel.

One last thought: even though a marginal win, but if we could add cirq to the colab runtime (with a regular process around it, tied to our release process), then we could speed up 1) even more and use a shared virtual env there too in case multiple notebooks change...though it might still be useful to think about 1) as "notebook environments" where cirq is not installed instead of just supporting colab.

dabacon commented 3 years ago

Is the variation above really due to the virtual env and pip install? It seems to me that this is more likely that these notebooks have some pretty heavy computation in them (the variation is quite large)...I'm just basing this on seeing the names and thinking, oh yeah that could be a lot of compute.

Would be nice if there was a way to not have to do the install every test, but must be a good reason this is not possible.

dabacon commented 3 years ago

Also probably there should be some discussion of notebook naming :)

balopat commented 3 years ago

Also probably there should be some discussion of notebook naming :)

Yeah, the quimb ones should be renamed to underscores and no capitalization. I did rename a bunch of them during the site revamp, so the situation is better than before. Are there any other naming things you see to be addressed

?

Is the variation above really due to the virtual env and pip install?

A big part of it for sure, e.g. the quimb ones have to install quimb as well...but there must be computational differences. I like the idea to make them faster by choosing smaller problem sizes by default and clarify for users how they can scale them up if they want to.

alex-treebeard commented 3 years ago

E.g. here's a PoC that uses treebeard for the sharded 1) case

note on treebeard: I've rebooted it as nbmake and nbmake-action -- you'd probs need to re-add your virtualenv logic as a pytest hook as I minimised the feature set.

mpharrigan commented 3 years ago

I'm getting this error https://github.com/nteract/papermill/issues/519 in #3669

Also, when I run the notebook check locally I think it runs a tonne of .ipynb_checkpoints/ files (as well as some old docs/_build notebooks from the sphinx days. Instead of globbing all *.ipynb's it might make more sense to glob through all notebooks checked in to git

balopat commented 3 years ago

Instead of globbing all *.ipynb's it might make more sense to glob through all notebooks checked in to git

I like that idea.

balopat commented 3 years ago

Today we are running almost at 30mins for notebook tests against a PR. This is unacceptable - so I'm going to make the build non-blocking until we figure out how to proceed. Options:

brute force parallelization
make notebooks themselves parameterized and hence faster for CI
push all the testing to later - e.g. only test notebooks after they were merged into master

mpharrigan commented 3 years ago

How often do we push the docs? Can we set up a system where that serves as the notebook test?

balopat commented 3 years ago

The devsite pipeline does test the notebooks. Currently it runs on an ad-hoc basis.

You're right that just eliminating tests in Github could be an option (like it is in ReCirq / OpenFermion / qsim today), however:

we need to explore automating that process and make it more frequent - I faintly recall that automation was discouraged due to security concerns, but maybe that changed...
the bigger issue I have this approach is that the devsite Kokoro builds are completely invisible for the outside world. The whole idea was to provide visibility for contributors by introducing the notebook testing earlier.

I'd rather just repurpose the notebook tests as a nightly test on master.

balopat commented 3 years ago

Just confirmed that 1.) is going to be fine - we'll have a nightly build.

So the only question is whether there's value in keeping the tests here. If we push the tests to after merging to master, the only value left is visibility (as long as "pre-release" notebooks still use pip install --pre, they are essentially the same workflow as the isolated notebooks) - which might be a survivable pain for now: it's on Googler maintainers to fix any issues related to PRs that break any of the notebooks.

It would simplify our lives a lot.

I'm a bit sad that I put so much effort into testing notebooks just to yank them out, but I don't want to fall into sunk cost fallacy either :) Maybe the notebook tests could still stick around for repro purposes, when things fail in the devsite pipeline. The changed notebook tests actually are not that expensive - they might still be valuable.

mpharrigan commented 3 years ago

Do you have data on how long each notebook takes? We could keep the github checks but remove long running notebooks. Each long running notebook that isn't checked in github has to have a google "owner" or sponsor to deal with breakages that are surfaced in the hidden doc build system.

I agree it's a shame to put the nice testing scripts to waste :)

balopat commented 3 years ago

An updated version from my Mac, where -n auto is significantly faster than on Github Actions VMs/containers taking altogether 8.2 minutes (instead of 32!):

431.12s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/qcvv/parallel_xeb.ipynb]
244.61s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/examples/advanced/quantum_volume_errors.ipynb]
153.52s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/qcvv/isolated_xeb.ipynb]
114.66s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/cirq/contrib/quimb/Cirq-to-Tensor-Networks.ipynb]
93.25s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/quantum_walks.ipynb]
93.18s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/examples/advanced/quantum_volume_routing.ipynb]
88.69s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/qcvv/xeb_coherent_noise.ipynb]
80.05s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/qcvv/xeb_theory.ipynb]
79.89s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/qaoa.ipynb]
77.86s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/educators/qaoa_ising.ipynb]
54.06s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/examples/advanced/quantum_volume.ipynb]
49.03s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/hidden_linear_function.ipynb]
34.06s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/educators/intro.ipynb]
26.85s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/variational_algorithm.ipynb]
25.45s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/rabi_oscillations.ipynb]
23.69s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/shor.ipynb]
22.30s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/cirq/contrib/quimb/Contract-a-Grid-Circuit.ipynb]
19.40s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/educators/textbook_algorithms.ipynb]
15.72s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/operators_and_observables.ipynb]
14.73s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/simulation.ipynb]
12.48s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/noise.ipynb]
12.15s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/circuits.ipynb]
11.11s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/custom_gates.ipynb]
10.93s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/gates.ipynb]
9.96s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/interop.ipynb]
9.01s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/cirq/contrib/svg/example.ipynb]
8.46s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/educators/neutral_atom.ipynb]
8.02s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/_template.ipynb]
7.95s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/basics.ipynb]
6.88s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/protocols.ipynb]
4.99s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/transform.ipynb]
4.85s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/tutorials/educators/ion_device.ipynb]
4.73s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/qudits.ipynb]
4.52s call     dev_tools/notebooks/notebook_test.py::test_notebooks_against_released_cirq[/Users/balintp/dev/proj/Cirq/docs/qubits.ipynb]

balopat commented 3 years ago

[note: the name of the test should be renamed to test_notebooks_against_current_branch ]

balopat commented 3 years ago

if we could parametrize the notebooks in a way that they could be run in a "fast" mode for tests, each of them < 10s, it would save a lot of time, and create a faster feedback loop. Anything above 30 seconds is a bit of a suspect for me that could be probably easily sped up.

dabacon commented 3 years ago

https://github.com/nteract/papermill might be helpful here

edit: oh haha we are already using this?

dabacon commented 3 years ago

I'm going to see if I can knock off some of the worst offenders using the papermill configuration.

balopat commented 3 years ago

edit: oh haha we are already using this?

yupp :)

I'm going to see if I can knock off some of the worst offenders using the papermill configuration.

Nice, thank you! 🙏

dabacon commented 3 years ago

So I have a PR that can speed up the notebook tests significantly by parameterizing the notebooks and supplying reasonable numbers during test. However it is incompatible with the tensorflow nb document formatter which removes all sort of things including the tag you need to use for the parameterized cell. Grrr.

dabacon commented 3 years ago

After sleeping on it, I think a better solution is to not use papermill. It has a deficiency in that you need to only have 1 cell that includes the parameters that can be updated. This makes the notebooks much less readable as you often have to put multiple parameters at the top of the notebook.

There are some frameworks for explicitly testing notebooks, those could be an option. testbook and pytest-notebook look to be the most popular. Testbook allows code injection. These are designed more for unit testing of the notebooks, i.e. something that really should be done, not just checking they run, but that's probably another story :)

A few other options involve writing our own way to modify the notebooks for test.

One way would be do to this inline in the notebook
```
repetitions = 10_000  # testing: repetitions = 100
```
which would do code replacement. This kind of uglifies the notebooks, but it is nice in that it everything is self contained, and you modify the code where it has an impact.
Alternatively we could do the replacements in an auxiliary file. So for example if we had notebook.ipynb then there could be a file notebook.pytest which includes the replacement in some syntax like
```
/repetitions = 10_000/repetitions = 10
```
or
```
repetitions = 10_000
repetitions = 10
```
If we don't want a separate file we could add a final cell that includes these replacements.

@balopat Thoughts?

mpharrigan commented 3 years ago

Thanks for investigating and those are some good ideas @dabacon. My 2 cents is we should give high priority to readability to notebooks in their intended form. Prioritize making our users happy rather than our github actions VM happy. It sounds like you're on the same page!

balopat commented 3 years ago

TLDR; I feel strongly about leaving the final doc as clean and natural as possible. This means, that I'm okay to sacrifice the ease of defining test values. Let's go with your version 2, and use a notebook.pytest to define regex replacements before running tests.

Detailed musings:

I introduced the notebook tests to achieve faster feedback loop on PRs specifically for the devsite pipeline where they run the notebooks with install scripts and all to generate the site. As such, the unit test framework direction is a bit out of scope at the moment unless we can make those frameworks do smarter parametrizations / code injections. It doesn't seem that way unless I'm missing something.

For a given parameter we have fixed place of usage(s) - however the following places vary in different solutions:

place of definition of the parameter
place of default (prod) value for the parameter
place of test value for the parameter Also I add uglification ~= "testing info that adds noise to the final notebook"

Without parametrization (naturally, currently):

place of definition: at or near first place of usage (=: natural)
place of default value = place of definition (typically) (=: natural)
test value: N/A
uglification: none

With papermill:

place of definition: single parameters cell have to be above before all the parameters
place of default value: in the single parameters cell
place of test value: yaml file
uglification: parameters tags, artificially separated parameter definition

With your comment based templating:

place of definition: ✔️ natural
place of default value: ✔️ natural
test value: ✔️ at or near first place of usage
uglification: extra comment on line

With your replacement file:

place of definition: ✔️ natural
place of default value:✔️ natural
test value: replacement file / separate
uglification: ✔️ none

With env vars: An age old templating device is to use env vars for injection, which is similar to the comment based templating, just a bit more well defined, less tooling is required

repetitions = os.environ.get("CIRCUIT_REPETITIONS",10_000)

place of definition: ✔️ natural
place of default value: ✔️ natural
test value: env var file similar to yaml file
uglification: os.environ.get

One solution to uglification, is that we could go even further and have our final notebooks "post-processed" from our template, to e.g. strip the comments out of them. We could have a ci test to ensure consistency. I don't like the large amount of duplication and extra maintenance here though.

dabacon commented 3 years ago

Bad luck. Got a version of this working only #4077 only to have notebook tests broken at head #4081 .

dabacon commented 3 years ago

OK have a PR that is now ready for review.

balopat commented 3 years ago

I think this can be closed now, notebooks are much faster, parallelized, partitioned, etc....please reopen if you think there is more to be done!

quantumlib / Cirq

Notebook test slow tests #3603