treebeardtech / nbmake

📝 Pytest plugin for testing notebooks
https://pypi.org/project/nbmake/
Apache License 2.0
179 stars 18 forks source link

nbmake without execution #86

Closed tschm closed 1 year ago

tschm commented 1 year ago

I want to iterate over tons of notebooks but without executing them beyond the point of importing their dependencies (e.g. and raise an error if they author is importing a package not in the environment).

Currently I have:

import importlib
from pathlib import Path

def verify(path):
    for file in path.glob("**/*.ipynb"):
        file = file.relative_to(path.parent).with_suffix('')
        importlib.import_module(f'ipynb.fs.defs.{file}')

if __name__ == '__main__':
    home = Path(__file__).parent / "notebooks"
    verify(home)

Please find the ipynb package here: https://pypi.org/project/ipynb/

alex-treebeard commented 1 year ago

hi @tschm, that sounds like a good feature idea.

It sounds like it should work something like this:

  1. I have ten notebooks, each with different structure
  2. I annotate the cell I want to stop at on each notebook (may be different for each one) with a metadata tag
  3. pytest --nbmake will ignore everything beyond that cell

How does that sound?

Some q's for you:

  1. How many nbs are you testing?
  2. What is the use case for these nbs?
tschm commented 1 year ago

I have hundreds of notebooks. Unfortunately, we can't enforce any structure. We give our researchers freedom here. Skills of developers cover a wide range. The nbs accompany ideas in quantitative trading. I would recommend to study a bit what this ipynb project is doing exactly. The annotation of cells should be avoided and somewhat imports can take place very late... With beyond the point I didn't really mean the geographic location in the notebook. I just want to avoid to execute the notebook (some of them run for like 2h+). My goal is only to scan the notebooks and make sure all the imported packages are in the virtual environment.

alex-treebeard commented 1 year ago

I see, thanks for clarifying (never came across that package before).

So you have a basic implementation of this feature with your verify function it seems.

Presumably, the main issues are the lack of pytest integration so that

  1. you cannot parallelise withpytest-xdist
  2. error reports are difficult to understand
  3. you have to maintain this code snippet.

Please let me know if I've misread your thinking.

Reckon we should have a flag that runs tests in import mode only? e.g.

pytest --nbmake --ipynb-import # Only imports the notebook with importlib.import_module(f'ipynb.fs.defs.{file}')

tschm commented 1 year ago

Yes, you phrased it so much better than I ever could :-). Thank you. I have used nbmake before and it's a great tool, well maintained and documented. I assume this kind of functionality is a tiny step for you.

I have placed my little script in a container and integrated it into our standard ci/cd pipeline. But the route via pytest and nbmake would be my first choice...

tschm commented 1 year ago

There are a few odd bits in the current ipynb based setup. I need an empty init.py file together with my notebooks to open the route towards making them a package. It would also run the class definitions. However, code like Base = declarative_base() would not be executed prior unless I change to BASE = declarative_base().

alex-treebeard commented 1 year ago

awesome -- I'll put together a minimal feature next week for you to try out.

BTW: Do you have production systems depending on importing these notebooks using the ipynb library?

It may be worth checking out alternatives that are more recently developed e.g. https://github.com/deathbeds/importnb/issues because ipynb has not had a commit for 5 years+

tschm commented 1 year ago

Exactly, I need a fresher and more robust alternative. Hence my interest in nbmake. Notebooks haven't really made it into production here (yet) but that doesn't mean they shouldn't be robust and tested. Of course the import test is only a first crude hurdle. Happy to act as Guinea pig

tschm commented 1 year ago

I have also noticed that ipynb wouldn't like %%time or !pip install ...

alex-treebeard commented 1 year ago

I've released v1.3.5 trying something slightly different to what we discussed.

You can now run notebooks and only report ModuleNotFoundErrors

pip install -U nbmake==1.3.5

Import errors can occur anywhere in a program so I'd rather we test imports more empirically than analytically.

Setting a cell timeout also lets us skip long running cells which are unlikely to hide imports at the end anyway.

Try it out like this if possible, please let me know how it goes!

pytest --nbmake --nbmake-find-import-errors --nbmake-timeout=20

Next steps:

tschm commented 1 year ago

My first experiments are all very promising. Thank you so much for your work on this problem. I have already identified a few notebooks importing libs they don't have in their underlying environment. At the same time your solution is minimally invasive. No need to alter the metadata of cells.

alex-treebeard commented 1 year ago

great, thanks for collaborating on this @tschm -- feel free to open more issues as needed.