Open dhimmel opened 4 years ago
Short answer is not really from inside the notebook. A pattern we use instead is to have a cell with the defaults tagged with parameters
so that users overwriting defaults will inject those overwrites in the next cell. It works surprising well for how simple it is.
If you really need something more sophisticated, you can extend the papermill engine to inject custom code or patterns that better fit your usecase. A few tools do this to apply their own opinions for execution that differ slightly from papermill base.
pattern we use instead is to have a cell with the defaults tagged with parameters so that users overwriting defaults will inject those overwrites in the next cell.
Yes, but for our use case we want to use the default parameters only when a notebook is being run interactively. When the notebook is being run by papermill, we want specifying parameter values to be required, i.e. the default values are disregarded.
Without customizing an engine to manipulate the notebook document I don't think it's possible today for your code to know what system is running the notebook kernel. From your code's PoV it's being driven by protocol requests abstractly, it doesn't really know if that's a browser, papermill, or some custom application code.
One thing dagstermill does in it's engine extension is completely replace the parameter cell when running instead of appending. If you did something similar you could make the next cell check existence of required parameters and have it behave as you expressed in interactive and in headless execution modes.
I'd be someone receptive to adding a flag to replace parameter cells if it was useful for others as an option flag.
I'd be someone receptive to adding a flag to replace parameter cells if it was useful for others as an option flag.
Ooh. That does seem like it could be a good solution for our problem. What about if papermill looked for a specific tag on parameter cells like papermill-replace
or replace
. This way we could set the behavior in the notebook rather than the papermill command... I'm actually not sure where the best place is. What do you think?
Sorry, just getting back to this thread now.
I'd want to get consensus for more folks that this behavior is desirable in the papermill project before we commit to increasing the tool's complexity. That being said if we did add this capability I think we'd start in the CLI as an optional flag, then if needed we'd set that flag conditionally based on the notebook tags.
This sort of behaviour sounds great to me. I'm in a situation right now where I have a notebook that will be run by papermill, but when I'm debugging I have extra code I'd like to run. For now I just leave a commented block at the bottom that I uncomment when debugging, but it would be great to put it either in a if not running_in_papermill:
block or have a way to skip the cell.
@ca-scribner -- what if, instead, you rely on an environment variable like PAPERMILL_ENV
that you assume is production
unless PAPERMILL_ENV
is set to development
:
if os.environ["PAPERMILL_ENV"] is "development":
# stuff
This is similar to NODE_ENV
, RACK_ENV
, etc. Then when you run papermill you can set it explicitly like this:
$ PAPERMILL_ENV=development papermill my-notebook.ipynb
@rgbkrk I hadn't thought of improvising my own - that's a good simple workaround. I'd still prefer papermill have its own broadcast env variable, but at a minimum I can make my own by calling with one. Thanks!
Not to bump an ancient thread, but has there been any progress on this as of yet? I'm currently interacting with a black-box system that executes notebooks in Papermill and I'm curious if since 2020 there has been movement in setting envvars from within the Papermill engine.
I'm open if you want to submit a PR that would, at least for the CLI version of papermill, set PAPERMILL_EXECUTION=1
or PAPERMILL_EXECUTION=true
.
I've wanted this for years, e.g. for toggling Plotly plots' output format:
papermill
) ⟹ png (avoid embedding 3MB of {HTML,JS,CSS} in notebook, and Git-committing it, only for it to not render on Github)Chat GPT just confidently told me that the env var suggested above already works, but it seems like it does not (based on this issue and a test I just ran).
Is it possible for a notebook cell to detect whether it is being executed by papermill (either by the command line interface or by
pm.execute_notebook
?I'm interested in the following application:
Basically, if running via papermill, we don't want to set defaults. But if not run by papermill, do set default values.