markovmodel / pyemma_tutorials

How to analyze molecular dynamics data with PyEMMA
Creative Commons Attribution 4.0 International
71 stars 34 forks source link

Revising 00-showcase notebook #107

Closed cwehmeyer closed 6 years ago

cwehmeyer commented 6 years ago

@marscher: it turns out that creating many short-lived progress bars is a huge slowdown and not very informative. Cells 5 and 4 (both do cross-validated vamp score calculations) are among the slowest parts in their container:

slowest cells with progress bars ``` =============================== Notebook timings =============================== ========================== slowest 30 test durations =========================== 97.76s call pyemma_tutorials/notebooks/00-pentapeptide-showcase.ipynb::Cell 5 79.44s call pyemma_tutorials/notebooks/07-expectations-and-observables.ipynb::Cell 11 66.77s call pyemma_tutorials/notebooks/07-expectations-and-observables.ipynb::Cell 12 54.61s call pyemma_tutorials/notebooks/00-pentapeptide-showcase.ipynb::Cell 1 42.59s call pyemma_tutorials/notebooks/00-pentapeptide-showcase.ipynb::Cell 4 ```

Deactivating the progress bars locally speeds up these cells a lot on my machine and also on circleci:

slowest cells without progress bars ``` =============================== Notebook timings =============================== ========================== slowest 30 test durations =========================== 74.00s call pyemma_tutorials/notebooks/07-expectations-and-observables.ipynb::Cell 11 66.04s call pyemma_tutorials/notebooks/00-pentapeptide-showcase.ipynb::Cell 1 64.01s call pyemma_tutorials/notebooks/07-expectations-and-observables.ipynb::Cell 12 61.06s call pyemma_tutorials/notebooks/00-pentapeptide-showcase.ipynb::Cell 15 53.43s call pyemma_tutorials/notebooks/00-pentapeptide-showcase.ipynb::Cell 13 47.77s call pyemma_tutorials/notebooks/00-pentapeptide-showcase.ipynb::Cell 9 28.11s call pyemma_tutorials/notebooks/00-pentapeptide-showcase.ipynb::Cell 5 27.02s call pyemma_tutorials/notebooks/07-expectations-and-observables.ipynb::Cell 1 20.99s call pyemma_tutorials/notebooks/00-pentapeptide-showcase.ipynb::Cell 4 ```

To this aim, I wrote a short context manager which might also be a good fit for pyemma:

no_prpgress_bars ```Python def no_progress_bars(): """Context manager to suppress progress bars. Example ------- Use inside a `with` statement: >>> with no_progress_bars(): >>> # operation which would print a progress bar """ class NoProgressBarsContextManager: def __init__(self): self.state = pyemma.config.show_progress_bars def __enter__(self): pyemma.config.show_progress_bars = False def __exit__(self, type, value, traceback): pyemma.config.show_progress_bars = self.state return NoProgressBarsContextManager() ```

What is your take on this?

marscher commented 6 years ago

we can disable the progress bars globally for the testing session (eg. have a look at the conf_test.py of pyemma). From my experience I can tell, that bars are not causing too much overhead (tqmd costs like 800 ns per iteration + the overhead of drawing widgets). One concern about your benchmark is, that you can't really compare two runs on CI, because they are being executed on different hardware every time (+ different workloads on these machines). You can compare these runtimes locally to get an idea.

marscher commented 6 years ago

btw. you could have also used pyemma.util.context.settings(show_progressbars=False) as context.

cwehmeyer commented 6 years ago

btw. you could have also used pyemma.util.context.settings(show_progressbars=False) as context.

Perfect, that is what I was looking for! I must have missed it in the progress bar documentation.

From my experience I can tell, that bars are not causing too much overhead [...]

In this particular case, the overhead is immense on my local machine: with progress bars, the score_cv function takes on average 1.53 ± 0.01 seconds; without only 0.3 ± 0.007 seconds.

marscher commented 6 years ago

On 12.07.2018 23:12, Christoph Wehmeyer wrote:

btw. you could have also used pyemma.util.context.settings(show_progressbars=False) as context.

Perfect, that is what I was looking for! I must have missed it in the progress bar documentation. there is a whole section about this at pyemma.org

From my experience I can tell, that bars are not causing too much overhead [...]

In this particular case, the overhead is immense on my local machine: with progress bars, the score_cv function takes on average 1.53 ± 0.01 seconds; without only 0.3 ± 0.007 seconds.

yeah for such quick things it does not really make sense to use pg's at all. Note that the estimators construction via api also provide a flag to show/hide progress.

marscher commented 6 years ago

In this particular case, the overhead is immense on my local machine: with progress bars, the score_cv function takes on average 1.53 ± 0.01 seconds; without only 0.3 ± 0.007 seconds.

measuring fast code is also hard in Python: https://pytest-benchmark.readthedocs.io/en/latest/calibration.html

But as I said, if the calculation you're doing is very fast - pg's are a waste of time.

cwehmeyer commented 6 years ago

Note that the estimators construction via api also provide a flag to show/hide progress.

Maybe not all of them:

def vamp(data=None, lag=10, dim=None, scaling=None, right=True, ncov_max=float('inf'), stride=1, skip=0, chunksize=None)
marscher commented 6 years ago

On 12.07.2018 23:17, Christoph Wehmeyer wrote:

Note that the estimators construction via api also provide a flag to show/hide progress.

Maybe not all of them:

def vamp(data=None, lag=10, dim=None, scaling=None, right=True, ncov_max=float('inf'), stride=1, skip=0, chunksize=None)

ok - this is a feature request then :D

marscher commented 6 years ago

nb00 with pgs: 289.4 seconds nb00 without pgs: 282.8 seconds

measured on my local institute machine.

marscher commented 6 years ago

sorry this was an old state. now the nubmers look like: with pg: notebooks/00-pentapeptide-showcase.ipynb took 158.0 seconds without pg: notebooks/00-pentapeptide-showcase.ipynb took 154.2 seconds

I think less than 4 seconds are fine for the sake of knowing how long stuff is computing. Nevertheless we should turn them off while testing.

BTW: very good job on speeding this notebook up!

cwehmeyer commented 6 years ago

Thanks!

I think less than 4 seconds are fine for the sake of knowing how long stuff is computing. Nevertheless we should turn them off while testing.

Weird. I still see a huge difference for the run time of individual cells on OSX (cell 4: 9 s w/o, 20 s w/)...

marscher commented 6 years ago

mhm - my numbers originate from pytest --nbval, so no ipywidgets involved. These seem to be expensive.

cwehmeyer commented 6 years ago

Can we globally deactivate progress bars during unit tests?

marscher commented 6 years ago

On 13.07.2018 00:38, Christoph Wehmeyer wrote:

Can we globally deactivate progress bars during unit tests?

debc325