Is it possible to tell whether a notebook is being executed by papermill?

nteract / papermill

📚 Parameterize, execute, and analyze notebooks

http://papermill.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

5.96k stars 429 forks source link

Is it possible to tell whether a notebook is being executed by papermill? #481

Open dhimmel opened 4 years ago

dhimmel commented 4 years ago

Is it possible for a notebook cell to detect whether it is being executed by papermill (either by the command line interface or by pm.execute_notebook?

I'm interested in the following application:

if not executed_by_papermill:
    # set default parameters

Basically, if running via papermill, we don't want to set defaults. But if not run by papermill, do set default values.

MSeal commented 4 years ago

Short answer is not really from inside the notebook. A pattern we use instead is to have a cell with the defaults tagged with parameters so that users overwriting defaults will inject those overwrites in the next cell. It works surprising well for how simple it is.

MSeal commented 4 years ago

If you really need something more sophisticated, you can extend the papermill engine to inject custom code or patterns that better fit your usecase. A few tools do this to apply their own opinions for execution that differ slightly from papermill base.

dhimmel commented 4 years ago

pattern we use instead is to have a cell with the defaults tagged with parameters so that users overwriting defaults will inject those overwrites in the next cell.

Yes, but for our use case we want to use the default parameters only when a notebook is being run interactively. When the notebook is being run by papermill, we want specifying parameter values to be required, i.e. the default values are disregarded.

MSeal commented 4 years ago

Without customizing an engine to manipulate the notebook document I don't think it's possible today for your code to know what system is running the notebook kernel. From your code's PoV it's being driven by protocol requests abstractly, it doesn't really know if that's a browser, papermill, or some custom application code.

One thing dagstermill does in it's engine extension is completely replace the parameter cell when running instead of appending. If you did something similar you could make the next cell check existence of required parameters and have it behave as you expressed in interactive and in headless execution modes.

I'd be someone receptive to adding a flag to replace parameter cells if it was useful for others as an option flag.

dhimmel commented 4 years ago

I'd be someone receptive to adding a flag to replace parameter cells if it was useful for others as an option flag.

Ooh. That does seem like it could be a good solution for our problem. What about if papermill looked for a specific tag on parameter cells like papermill-replace or replace. This way we could set the behavior in the notebook rather than the papermill command... I'm actually not sure where the best place is. What do you think?

MSeal commented 4 years ago

Sorry, just getting back to this thread now.

I'd want to get consensus for more folks that this behavior is desirable in the papermill project before we commit to increasing the tool's complexity. That being said if we did add this capability I think we'd start in the CLI as an optional flag, then if needed we'd set that flag conditionally based on the notebook tags.

ca-scribner commented 4 years ago

This sort of behaviour sounds great to me. I'm in a situation right now where I have a notebook that will be run by papermill, but when I'm debugging I have extra code I'd like to run. For now I just leave a commented block at the bottom that I uncomment when debugging, but it would be great to put it either in a if not running_in_papermill: block or have a way to skip the cell.

rgbkrk commented 4 years ago

@ca-scribner -- what if, instead, you rely on an environment variable like PAPERMILL_ENV that you assume is production unless PAPERMILL_ENV is set to development:

if os.environ["PAPERMILL_ENV"] is "development":
  # stuff

This is similar to NODE_ENV, RACK_ENV, etc. Then when you run papermill you can set it explicitly like this:

$ PAPERMILL_ENV=development papermill my-notebook.ipynb

ca-scribner commented 4 years ago

@rgbkrk I hadn't thought of improvising my own - that's a good simple workaround. I'd still prefer papermill have its own broadcast env variable, but at a minimum I can make my own by calling with one. Thanks!

j1ah0ng commented 1 year ago

Not to bump an ancient thread, but has there been any progress on this as of yet? I'm currently interacting with a black-box system that executes notebooks in Papermill and I'm curious if since 2020 there has been movement in setting envvars from within the Papermill engine.

rgbkrk commented 1 year ago

I'm open if you want to submit a PR that would, at least for the CLI version of papermill, set PAPERMILL_EXECUTION=1 or PAPERMILL_EXECUTION=true.

ryan-williams commented 5 months ago

I've wanted this for years, e.g. for toggling Plotly plots' output format:

interactive notebook ⟹ interactive HTML
noninteractive notebook (papermill) ⟹ png (avoid embedding 3MB of {HTML,JS,CSS} in notebook, and Git-committing it, only for it to not render on Github)

Chat GPT just confidently told me that the env var suggested above already works, but it seems like it does not (based on this issue and a test I just ran).