jupyter / nbformat

Reference implementation of the Jupyter Notebook format
http://nbformat.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
265 stars 151 forks source link

Rethinking kernelspecs in notebook metadata #81

Open takluyver opened 7 years ago

takluyver commented 7 years ago

(Moving the conversation here from the mailing list)

This is prompted by a couple of problems I've come across with kernelspecs:

a) In nbval, when you want to check notebooks against multiple Python versions, the obvious approach is to create an environment (e.g. a Travis job) for each, and run the tests inside it. But the notebook always runs with the kernel in its metadata (e.g. if it's saved with Python 3, testing on Python 2 will still run a Python 3 kernel). We worked around this by adding a --current-env flag.

b) Anaconda installs a notebook server extension which exposes conda environments as kernelspecs. But this doesn't affect other code using Jupyter, causing problems in e.g. nbconvert (https://github.com/jupyter/nbconvert/issues/515 ). More generally, identifying kernels with an environment name only makes sense within one computer.

I've been turning this over in my head for a while. I think there are three kinds of information relevant to starting a kernel for a notebook:

  1. In what programming language does the code make sense? This is mostly captured by our language_info metadata, and the notebook application's fallback behaviour when it can't find a named kernel. But there's still a bit of ambiguity with different versions of a language (e.g. do we treat Python 3 and Python 2 as one language?).

  2. How do we set up an environment with the dependencies for the notebook? There's some excellent work going on for this at https://github.com/jupyter/nbformat/pull/60 , but it's not what I want to discuss here.

  3. Which available kernel for this notebook's language should we start to run it? At present, we use the name of the kernel when the notebook was saved - this is convenient for some use cases, but leads to problems (a) and (b) described above.

I propose that we change how we pick a kernel, by depreacting the kernelspec metadata in notebooks and adding a pluggable KernelPicker class in jupyter_client. The default KernelPicker would follow these rules:

i) If the calling code explicitly specifies a kernel, start that one. ii) If there is only one kernel available for the notebook's language, start that one. iii) If the notebook is in Python and ipykernel is installed in the current environment, start ipykernel in this environment. This is a bit specific, but it's often what you want for tools like nbval and nbconvert. iv) There are either no kernels or multiple kernels installed for the language in question. Error out, indicating to the user that they should specify a kernel to be used (see (i)).

For the notebook application, we may plug in a different KernelPicker which records which kernels have been used for which notebooks, similar to the present behaviour. Even if we don't, Continuum or other people may implement something like this. But we wouldn't use this in tools like nbconvert and nbval.

Once there is a way to store environment descriptions in notebook metadata, and to create an environment for a notebook, another KernelPicker class may be involved in associating notebooks with the environment created for them.

This proposal is still rough, but I think that we need to move away from storing local kernel names in notebook metadata, now that we're getting more insight into how kernelspecs are used.

minrk commented 7 years ago

I think this makes sense as a simplification of things.

Right now, I think we have this in the notebook application:

So maybe this proposal is essentially to implement the same (or similar) logic that we have in the notebook javascript in jupyter_client?

I'd like to iron out what "explicitly specifies a kernel" would mean, if not the kernelspec name in the metadata. How would the notebook application 'record' what kernel I used, if not the current behavior? In my experience, this seems more personal information on my system (e.g. ipython-stable), rather than information to share with others, so it may not necessarily belong in the notebook itself, but should be associated with it somehow. Metadata might still be the simplest place, though.

I'd like to make it impossible to have the ipykernel package without its kernelspec installed, which ought to eliminate the iii case, as any time an env is active and ipykernel is installed, its kernelspec will be present. This is already true in conda, but our wheels don't install kernelspecs. I have some ideas for fixing that, though.

rgbkrk commented 7 years ago

I'm happy this discussion is starting as this has been hard for me to wrap my head around what's both simple for users, extensions on our machinery, and us as the developers.

blink1073 commented 7 years ago

I'm inclined to agree with Min; we should allow for a more specific override. If the notebook was run against Python 3, then perhaps it contained Python 3-specific code and should not be run with a Python 2 kernel, forcing the explicit decision to use a --current-env flag.

jankatins commented 7 years ago

For the nbconvert problem with https://github.com/Cadair/jupyter_environment_kernels and https://github.com/Anaconda-Platform/nb_conda_kernels: it might be nice to convert the config option into an entry point, so that installed kernel manager gets used in both the notbook and nbconvert. For that the discoussion in https://github.com/Anaconda-Platform/nb_conda_kernels/issues/42#issuecomment-236646074 might be of some help, which talsk about converting the functionality into entry_points but sumit the entry point "declaration" to one (or more?) of the jupyter repositories.

takluyver commented 7 years ago

Min:

I'd like to iron out what "explicitly specifies a kernel" would mean, if not the kernelspec name in the metadata.

In the context of (i), I mean that the caller specifies it, not the notebook file. For nbconvert, for instance, that might mean running:

# This does not work now - kernel_name is not aliased
jupyter nbconvert --execute --kernel-name foo MyNotebook.ipynb

How would the notebook application 'record' what kernel I used, if not the current behavior? In my experience, this seems more personal information on my system (e.g. ipython-stable), rather than information to share with others, so it may not necessarily belong in the notebook itself, but should be associated with it somehow. Metadata might still be the simplest place, though.

I don't have a ready answer for this, but here are some thoughts:

I'd like to make it impossible to have the ipykernel package without its kernelspec installed, which ought to eliminate the iii case

It doesn't eliminate it, though it may change how you tackle it. The point of (iii) is that if there are multiple Python kernels available, the one using the same sys.executable as the code preparing to launch it is the default.

Steve:

If the notebook was run against Python 3, then perhaps it contained Python 3-specific code and should not be run with a Python 2 kernel,

This is a tricky one, and I'm not sure quite how to deal with it. It's definitely possible to have a Python 3 notebook, but it's also entirely possible to have a notebook that you want to run on Python 2 and 3 - we do precisely this to test nbval, for instance. My inclination is to treat them as the same language, and leave more specific requirements to the environment specification stuff being discussed in #60.

Jan:

For the nbconvert problem... it might be nice to convert the config option into an entry point, so that installed kernel manager gets used in both the notebook and nbconvert.

If we can't work this out, then that definitely makes sense. But I see it as a particular symptom of a larger problem - kernel names only make sense in context, so it doesn't make sense to embed them into notebook files - so I'd like to try to find a solution to the general problem first.

takluyver commented 7 years ago

To frame the issue a bit differently: when we devised kernelspecs, we envisaged them basically representing languages - you have a Python kernelspec, an R kernelspec and so on. Those names are globally meaningful. We thought about environments but decided to punt the question.

Now, in the absence of a separate mechanism to deal with kernels in different environments, people are using kernelspecs to represent environments, and the names are therefore not globally meaningful. We've accommodated that by separating language_info metadata from the kernelspec. This idea continues to embrace using kernelspecs like that. The alternative would be to devise and implement a separate environment mechanism, and push kernelspecs back to mostly representing languages. I think that would be far more effort at this point.

blink1073 commented 7 years ago

True, the language version is part of language info and could be used to select the appropriate Python kernel for instance.

blink1073 commented 7 years ago

And the environment metadata discussion handles the case of "which libraries does this depend on".

jankatins commented 7 years ago

If we can't work this out, then that definitely makes sense. But I see it as a particular symptom of a larger problem - kernel names only make sense in context, so it doesn't make sense to embed them into notebook files - so I'd like to try to find a solution to the general problem first.

Whatever you do, please don't kill my local workflow: I use one environment per project so i have currently >10 kernels visible via the environment-kernel-manager and I never use the "default" kernel (I even tried to remove the ipykernel package from the environment where the notebook server is installed). From my standpoint (no sharing), the current setup is almost perfect: if I delete a environment or import a new notebook, I get a error message to chose a kernel and there I can either build a new environment or use one of the current ones ("Almost", because nbconvert does't see the environment kernels, but right now I almost never use nbconvert to execute notebooks -> this problem would be eliminated by the extension point proposal I mad above).

minrk commented 7 years ago

@janschulz I think we'll make sure that's well supported. It's how I work as well. I think we have to keep some version of remembering which local kernelspec was used, preserving that, and using it as the preferred choice on next run if available. I'm especially interested in separating the notebook server env from the kernel env as much as possible, and making that a decent experience. Right now we're a bit in the middle, where neither is served super well.

Whether it's an extension point or some other solution for a global configurable (c.KernelManager.kernel_picker_cls could work), that would help a lot of things. Selecting a KernelSpecManager (or KernelPicker) implementation should probably be a global choice, not application-specific.

takluyver commented 7 years ago

Selecting a KernelSpecManager (or KernelPicker) implementation should probably be a global choice, not application-specific.

Actually, my thought is that the notebook application may have a different KernelPicker from command line tools like nbconvert & nbval. I feel like more 'memory' of notebook-kernel connections is acceptable in the interactive application, whereas command line tools shouldn't be trying to be clever about things like that.

I'd agree that the KernelSpecManager (which can list available kernels and retrieve the info necessary to start one) should be common across different applications, so they all see the same set of kernels. But I think you may want picking a kernel to try more 'guessing' in some applications than in others.

minrk commented 7 years ago

Ah, interesting. I was thinking that much of the problem was things like nbconvert disagreeing with the live notebook, not codifying that the two should be very different. nbval seems more like the outlier in terms of desirable strategy than the live notebook, to me.

minrk commented 7 years ago

I certainly agree that this proposal's addition of the kernel-selection strategy as a clear axis of configuration should make it easier to make things match or not, and defaults for different applications can be a somewhat separate discussion.