Open ian-r-rose opened 5 years ago
Sorry for the late response.
- Is this an obviously dumb idea for some reason that I am missing?
No it's not dumb. I think for most users they've gotten by having the browser do the execution for them, and more complicated stacks have their own flavor is scheduled or managed notebook executions. But there's several examples of programmatic versions of this behavior.
- Does this already exist? Is there prior art I should be aware of?
Yes, I believe paperboy was an attempt at this. I haven't kept up with it's stability or progress in the past few months though. Another invocation has been with scheduling airflow papermill operators and programmatically executing them. Dagster does this pattern in it's own way as well. And some non-open source stacks also allow for remote execution of notebooks via schedulers.
- Regarding implementation: is there a good way for the papermill Python API to be invoked asynchronously?
Papermill, and it's upstream dependencies are not written to support async execution models. See https://github.com/jupyter/nbconvert/issues/1092 for a recent thread talking about adding this to the dependency that would enable it here in papermill.
some non-open source stacks also allow for remote execution of notebooks via schedulers
hope it's ok to post a plug here, but for people that need a quick solution to this issue, https://seekwell.io/ does this. You can link a local notebook via our Chrome Extension to a remote one that we'll run for you on a Google Compute Engine instance. It comes loaded with all the common ML / analytics packages.
I've been researching using papermill for a particular notebook-scheduling application, and was considering writing a notebook server extension for triggering papermill jobs, listing them, etc. Before I got going on that, I wanted to ask about it here.
Thanks for all the hard work!