nteract / papermill

📚 Parameterize, execute, and analyze notebooks
http://papermill.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5.96k stars 429 forks source link

Notebook server extension #434

Open ian-r-rose opened 5 years ago

ian-r-rose commented 5 years ago

I've been researching using papermill for a particular notebook-scheduling application, and was considering writing a notebook server extension for triggering papermill jobs, listing them, etc. Before I got going on that, I wanted to ask about it here.

  1. Is this an obviously dumb idea for some reason that I am missing?
  2. Does this already exist? Is there prior art I should be aware of?
  3. Regarding implementation: is there a good way for the papermill Python API to be invoked asynchronously?

Thanks for all the hard work!

MSeal commented 5 years ago

Sorry for the late response.

  1. Is this an obviously dumb idea for some reason that I am missing?

No it's not dumb. I think for most users they've gotten by having the browser do the execution for them, and more complicated stacks have their own flavor is scheduled or managed notebook executions. But there's several examples of programmatic versions of this behavior.

  1. Does this already exist? Is there prior art I should be aware of?

Yes, I believe paperboy was an attempt at this. I haven't kept up with it's stability or progress in the past few months though. Another invocation has been with scheduling airflow papermill operators and programmatically executing them. Dagster does this pattern in it's own way as well. And some non-open source stacks also allow for remote execution of notebooks via schedulers.

  1. Regarding implementation: is there a good way for the papermill Python API to be invoked asynchronously?

Papermill, and it's upstream dependencies are not written to support async execution models. See https://github.com/jupyter/nbconvert/issues/1092 for a recent thread talking about adding this to the dependency that would enable it here in papermill.

mike-seekwell commented 4 years ago

some non-open source stacks also allow for remote execution of notebooks via schedulers

hope it's ok to post a plug here, but for people that need a quick solution to this issue, https://seekwell.io/ does this. You can link a local notebook via our Chrome Extension to a remote one that we'll run for you on a Google Compute Engine instance. It comes loaded with all the common ML / analytics packages.