nteract / papermill

📚 Parameterize, execute, and analyze notebooks
http://papermill.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5.97k stars 429 forks source link

Support `allow_errors` flag from `nbconvert` #269

Closed bnaul closed 5 years ago

bnaul commented 5 years ago

It looks like the nbconvert execute preprocessor supports an allow_errors flag that lets a notebook continue running whenever errors are encountered and instead just logs them: https://github.com/jupyter/nbconvert/blob/02c5cd1424b65da018f244e0f033a3dde55f76ca/nbconvert/preprocessors/execute.py#L121-L132

I certainly don't think this should be the default behavior but it would be nice for some long-running reports that might fail at various stages. Is there anything that would be required to support this besides adding that flag to pm.execute_notebook and propagating it through to nbconvert?

I'm not sure I fully understand the notion of engines but if this is engine-specific behavior then perhaps it needs to be specified differently?

rgbkrk commented 5 years ago

Hey @bnaul!

Papermill takes on the opinion that notebooks should run linearly and errors should bubble up. It forms an essential part of a CI pipeline so that you know when things are going wrong.

As a reader of notebooks, I expect to be able to read notebooks as standard artifacts that run from top to bottom.

As someone who runs a data platform, I want insight into notebooks that are failing. This would allow even syntax errors to pass through. If anyone supports you with infrastructure or inherits your jobs and doesn't realize this is how your notebooks work, they may spend a lot of time troubleshooting something that breaks the normal semantics of a notebook.

If you really need this, I recommend making exception catching explicit in your notebooks to denote that cells are being steamrolled on through. The other option is to build tools around nbconvert for this (or even use nbconvert directly). This is not an abstraction we'd want to bring into papermill.

bnaul commented 5 years ago

Thanks Kyle. I certainly understand that perspective; I suppose my use cases are different in that I started out calling nbconvert directly and only switched to papermill for the easier parameter passing, so I would prefer to be able to pass along arbitrary arguments to the nbconvert preprocessor for compatibility.

Even if you don't agree philosophically with suppressing exceptions, it seems like there might be other situations where passing different optional flags to nbconvert would be desirable. But if the idea is that papermill should only do things one way rather than support whatever Jupyter at large does (again I can see both pros/cons) then I guess that wouldn't make sense.

mpacer commented 5 years ago

For context: So far the argument has been that by avoiding using JupyterApp and traitlets, we have simplified papermill's API. Adding that would be the most straightforward way to get papermill to accept arbitrary nbconvert commands. I think it's gone pretty well so far in that regard (I was originally on the other side of the issue and @mseal won me over), but I'm always open minded.

One possibility would be to try to adapt the papermill parameter passing as an nbconvert preprocessor. It wouldn't be too challenging to reproduce the parameterisation functionality that exists today (from a logic perspective).

Note, in contrast to the parameter passing the engine functionality wouldn't be easy to transport.

MSeal commented 5 years ago

You could also run the notebook with papermill --report-mode to parameterize it, and then do a direct nbconvert call.

Alternatively you can register a copy of the nbconvert engine for your own execution to set the flag.

I think the only feature we'd really want to directly adopt might be a way to pass options to engines (which is currently difficult) rather than exposing lower level options directly.

bnaul commented 5 years ago

--report-mode seems like a good workaround for this particular problem and I have no other strong opinions about the engine configuration 🙂. thanks!

djherbis commented 4 years ago

@MSeal @mpacer Kaggle has been experimenting with using papermill instead of nbconvert for executing notebooks, however not having this as an option has been a bit of a blocker for us.

It seems we have some notebooks (ex. Kaggle Learn) which expect to produce an exception and continue, and we also have things like Competitions where an exception may occur but they still create a valid submission and we wouldn't want to boot them for that (on the other hand, we do have some users who would prefer to stop early rather than run 5hrs after a failure).

The main feature we like about papermill is that unlike nbconvert it prints out cell output, something users have been wanting for years.

Would papermill ever consider supporting allow_errors, or should I investigate other solutions?

MSeal commented 4 years ago

@djherbis Since we pass along kwargs to the downstream systems this is actually already supported with the allow_errors=True argument to execute_notebook, it's just not exposed to the CLI atm.

pm.execute_notebook('input.ipynb', 'output.ipynb', allow_errors=True)