nteract / papermill

📚 Parameterize, execute, and analyze notebooks
http://papermill.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5.97k stars 429 forks source link

Kernel not found with venvs (jupyter_client.kernelspec.NoSuchKernel) #761

Closed Nikolai-Hlubek closed 11 months ago

Nikolai-Hlubek commented 11 months ago

🐛 Bug

Sorry for the wall of text.

Intro

This is a bug / limitation in a complex scenario. I will try to describe exactly what happens, but I don't know the architecture of jupyter and papermill good enough to propose a fix. My hope is that somebody experienced in both can take a look at it. If it is not possible to fix in jupyter or papermill, maybe my findings will help somebody else that encounters this.
The fix I propose is: Try setting the environmental variable JUPYTER_PATH to the location of your compute kernels.

Description to Bug

We run a jupyter server from a venv, which has several kernels attached with their separate _venv_s. The kernels are installed with a prefix directory and not with the --user flag. I'll explain below why it works with the --user flag and hence probably for most people as --user is set in all tutorials.

In particular the venvs are in the following directories: /opt/pyenvs/jupyter <- jupyter server venv /opt/pyenvs/DSS01 <- compute kernel 1 venv /opt/pyenvs/DSS02 <- compute kernel 2 venv /opt/pyenvs/DSS03 <- compute kernel 3 venv

With this installation the kernelspecs are found in: /opt/pyenvs/jupyter/share/jupyter/kernels/datasciencestack_a.01
/opt/pyenvs/jupyter/share/jupyter/kernels/datasciencestack_a.02
/opt/pyenvs/jupyter/share/jupyter/kernels/datasciencestack_a.03
/opt/pyenvs/jupyter/share/jupyter/kernels/python3

This works for jupyter-lab to select and run kernels.

When I try to run a template notebook with papermill I get the following error:

[IPKernelApp] ERROR | No such kernel named datasciencestack_a.03
Traceback (most recent call last):
  File "/opt/pyenvs/DSS03/lib/python3.10/site-packages/jupyter_client/manager.py", line 82, in wrapper
    out = await method(self, *args, **kwargs)
  File "/opt/pyenvs/DSS03/lib/python3.10/site-packages/jupyter_client/manager.py", line 391, in _async_start_kernel
    kernel_cmd, kw = await ensure_async(self.pre_start_kernel(**kw))
  File "/opt/pyenvs/DSS03/lib/python3.10/site-packages/jupyter_client/utils.py", line 38, in ensure_async
    return await obj
  File "/opt/pyenvs/DSS03/lib/python3.10/site-packages/jupyter_client/manager.py", line 353, in _async_pre_start_kernel
    self.kernel_spec,
  File "/opt/pyenvs/DSS03/lib/python3.10/site-packages/jupyter_client/manager.py", line 178, in kernel_spec
    self._kernel_spec = self.kernel_spec_manager.get_kernel_spec(self.kernel_name)
  File "/opt/pyenvs/DSS03/lib/python3.10/site-packages/jupyter_client/kernelspec.py", line 294, in get_kernel_spec
    raise NoSuchKernel(kernel_name)
jupyter_client.kernelspec.NoSuchKernel: No such kernel named datasciencestack_a.03

What happens under the hood:

In kernelspec.py: First _get_kernel_spec(self, kernelname) with the intended kernelname gets called. This calls _find_spec_directory(kernelname.lower()). Which gets the kernel locations from _self.kerneldirs. And this should somehow be set in _ _kernel_dirsdefault(self), but I didn't find the link to this one.

Anyway at this point the kerneldirs in our case are set to:

['/home/jupyter/.local/share/jupyter/kernels',  
'/opt/pyenvs/DSS03/share/jupyter/kernels',    # <- papermill started from this kernel
'/usr/local/share/jupyter/kernels', 
'/usr/share/jupyter/kernels', 
'/home/jupyter/.ipython/kernels']

However the kernel_dirs should contain /opt/pyenvs/jupyter/share/jupyter/kernels as this is where the actual specs are found.

This first entry of _kerneldirs is the user directory which is always included and hence it will work when you use --user in the options when installing a kernel.

The paths are generated in /opt/pyenvs/DSS03/lib/python3.10/site-packages/jupyter_core/paths.py -> jupyter_path This method mentions that you can overwrite everything by setting JUPYTER_PATH. Setting JUPYTER_PATH fixes the NoSuchKernel issue.
The question is, if this should be necessary. Can't papermill somehow detect the venv of the jupyter server and add its path accordingly? Or should this be done somewhere in juypter-core? Jupyter somehow should keep track where its kernels are installed.

MSeal commented 11 months ago

So you somewhat hit on the right place to look for a solution. Basically papermill is running as a library below any virtual boundaries, so it's a dumb client in the fact that it implies no environment manipulation. It does this because when used as an imported library the intention is to be using the environment the current process has established and not to manipulate that processes intentions. Basically papermill has no knowledge of jupyter server, which might be doing fancier environment management. Becoming aware of higher order organizations makes the client overly complicated and brittle to upstream changes.

I usually recommend explicitly activating the virtualenv you wish to use before calling papermill so there's no confusion. In more complicated scenerios, setting env variables to override defaults (which is exactly how env activation works as well) can work. Some groups also make small wrapper CLIs to do custom setups for this purpose (e.g. {myorgname}-papermill) to add any business logic in a way that's explicit to follow but doesn't require daily users to have to remember.

Hope that helps

Nikolai-Hlubek commented 11 months ago

Hi Matthew

Thanks for the reply and explainations. I reformulated my original question a bit so it can at least be used as search target if somebody encounters the same issue.