nteract / papermill

📚 Parameterize, execute, and analyze notebooks
http://papermill.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5.97k stars 429 forks source link

Should we try to infer a default kernel when none is provided? #338

Open mgasner opened 5 years ago

mgasner commented 5 years ago

Right now, when calling execute_notebook, papermill requires either an explicitly specified kernel_name, or a kernelspec to be present in the notebook's metadata.

This complicates situations a) where we are writing notebooks that are compatible across the py2/py3 language barrier and b) where we are distributing notebooks to the public.

In the first case, if we specify either a python2 or python3 kernel in our notebooks, we will work out of the box (on a default install) on one Python version, but not on the other, even if the notebook is compatible. There are many workarounds, but all are burdensome and distracting.

In the second case, regardless of what kernel we specify in our notebooks, we are not guaranteed that it will be present in the environment to which we are distributing notebooks.

Granted that it is impossible to magically resolve these issues in general, does anyone see a strong downside to trying to infer a kernel when it is not specified, perhaps following the scheme in #262 (looking at the language), or perhaps looking at Jupyter settings such as MappingKernelManager.default_kernel_name (but cf. https://github.com/jupyter/notebook/issues/3338)?

MSeal commented 5 years ago

So think of papermill's kernel assignment as an override tool. It's usually not used and one relies on the notebook's metadata to make a decision.

You're right that the notebook framework doesn't have a place to specify that the notebook can be run with multiple kernels, just fallback mechanisms for if the specified kernel is missing.

I'd actually follow up the conversation on https://github.com/jupyter/nbformat/ and/or https://discourse.jupyter.org/ as this may be something we want to include in nbformat 5.0. Today there 4.4 spec requires that the kernel name and display_name be present in the document: https://github.com/jupyter/nbformat/blob/master/nbformat/v4/nbformat.v4.schema.json#L13-L27 which does force the kernel into 2 vs 3 in the python case.

In your case, is it that you just want to advertise that the notebook support both versions? Because you could use kernel name which is generic (e.g. python) and supply a python kernel in your stack which defaults to the version you see fit but indicates it only accepts 2 and 3 compatible code.

Another important question is, given much of the tooling will be dropping support for python 2 at the end of the year (some already has), how much is it worth indicating 2.7 and 3.x support? Perhaps defaulting to 3 with a metadata indicator in the notebook that it's been tested against 2.7 would be sufficient for dagstermill to choose to keep the 3 kernel or explicitly be able to override if the user lives in 2?