Add automatic tests with IPython notebooks

ihrke commented 9 years ago

One way to do this, is to use a script like this: https://github.com/paulgb/runipy and then let travis-ci upload the processed notebooks to a specific branch of the ipycache-repo. Here is a link describing how this is done: http://sleepycoders.blogspot.se/2013/03/sharing-travis-ci-generated-files.html

A disadvantage here is that you have to manually look at the notebooks to see if everything went fine.

Another way would be to define unit tests in an ipy-notebook and run them from the command line (e.g., using py.test https://pypi.python.org/pypi/pytest-ipynb). However, I'm not sure if that works with the magics etc.

rossant commented 9 years ago

Another way would be to define unit tests in an ipy-notebook and run them from the command line (e.g., using py.test https://pypi.python.org/pypi/pytest-ipynb). However, I'm not sure if that works with the magics etc.

I think that would be the best solution. I'm sure we can find a way to make the magic commands work (see e.g. this class). The idea would be to loop over all input cells, execute the code using the InteractiveShell, capture the output, and compare the output with the expected output stored in the .ipynb file. This way we would have a way to automatically test a notebook containing input + output by running it and checking that the output is correct.

ihrke commented 9 years ago

Interesting idea!

But comparing cell output to previously generated output may break when minor changes are made (could also happen when minor changes to external libraries are made?). Say we generate a plot and some default in matplotlib changes such that the plot is not exactly the same then we all our tests would break if the naively compare the stored to the new plots. Maybe we should run all the cells as you suggest but instead of comparing to previously stored output insert assertions/throw errors?

rossant commented 9 years ago

It's true that comparing the base64-encoded plots would not be very robust. Maybe at first we could just compare text output?

WDYM exactly by inserting assert/throw errors?

ihrke commented 9 years ago

Example:

cell 1

%%cache test.pkl a
a=[i for i in range(10000)]

cell 2

import ipycache
try:
   a=ipycache.load_vars('./test.pkl', ['a'])
except:
   raise CustomErrorThatTellsUsSomething

Then run all the cells (from the command line in the IPython Kernel) and catch any raised error and report it as usual.

rossant commented 9 years ago

OK I see. That's a possibility indeed. Maybe it would be worth encapsulating the code in cell 2 in some private testing function like _test_cached_var('./test.pkl', 'a') or something.

Also, I think what you describe is rather close to the unit tests that already exist. We could definitely do that, but I think another sort of notebook-based test would be useful as well.

You would have an actual example notebook that would only contain user-exposed commands (so basically just %%cache) and no testing logic. To test it, we would just compare the text outputs. For example, the cached cell could contain a print() statement, and we could check that it would only show up the first time, etc.

Something roughly like this:

example.ipynb:

# cell 1
%%cache test.pkl a
print("Computing...")
a=[i for i in range(10000)]

# cell 2
print(len(a))

test_notebooks.py:

nb = Notebook('example.ipynb')
nb.run_all()
assert check_nb_outputs(nb, ['Computing...', '10000'])

nb.run_all()
assert check_nb_outputs(nb, ['', '10000'])

ihrke commented 9 years ago

Actually, your second test should produce exactly the same output, i.e., Computing...\n10000, since ipycache saves and loads the outputs. The only thing that should differ would be the verbosity-output, i.e., [Saved Variables...] vs. [Skipped the cell's code and loaded, ...]. So we should test if the first run produced [Saved Variables...] and the second [Skipped the cell's ...].

The difference between what I suggested and the tests we currently have is that the magic is directly run through ipython's magic interface instead of the mock functions used in test_ipycache.py.

Anyway, we could of course mix the approaches: since the output of the cell is always stored in _captured_io, we can just look at

io=ipycache.load_vars('./test.pkl', ['_captured_io'])
assert check_cell_output(io['stdout'].getvalue(), 'Computing...')

after having run the cell. That would also decouple testing logic from ipython-notebook code (but actually, I would prefer to run the tests in the NB because it's easier to develop and run).

rossant commented 9 years ago

Actually, your second test should produce exactly the same output, i.e., Computing...\n10000, since ipycache saves and loads the outputs. The only thing that should differ would be the verbosity-output, i.e., [Saved Variables...] vs. [Skipped the cell's code and loaded, ...]. So we should test if the first run produced [Saved Variables...] and the second [Skipped the cell's ...].

Ha I had forgotten that! I'm wondering whether that's a good behavior...? Seeing Computing... in this example would be confusing because I'd think that my code is actually executed!

I do agree that testing logic should be decoupled from the notebook. That being said, having minimal assertions in the notebooks would be fine as long as these are just a couple of lines of code demonstrating what would be expected from normal behavior. Then we could have an "examples" folder with some notebooks demonstrating how ipycache works, and these examples would also be tested by the testing suite (like "doctests" in a way).

ihrke commented 9 years ago

Ok, but how do we define if the notebook passes the test? In a doctest, you have to add some code to a function that defines the test. We could of course add a doctest snippet to each cell that is executed by the test-runner after the cell is run? Say

%%cache test.pkl a
print("Computing...")
a=[i for i in range(10000)]

"""
#doctest 
assert len(a)==10000
assert os.path.exists('test.pkl')
"""

rossant commented 9 years ago

Why not just putting this doctest code in the next cell? We say that the test fails if an assertion is raised during the notebook execution, otherwise it passes.

ihrke commented 9 years ago

I added a ipynb_runner.py script which can run the ipython notebooks from the command line. It reports if any of the cells fail (i.e., cause an exception). This PR already runs the notebooks in examples on travis-ci (just take a look at the last build).

rossant commented 9 years ago

FYI I just found this: https://github.com/bollwyvl/nosebook

rossant / ipycache

Add automatic tests with IPython notebooks #7