Closed Nikolai-Hlubek closed 11 months ago
So you somewhat hit on the right place to look for a solution. Basically papermill is running as a library below any virtual boundaries, so it's a dumb client in the fact that it implies no environment manipulation. It does this because when used as an imported library the intention is to be using the environment the current process has established and not to manipulate that processes intentions. Basically papermill has no knowledge of jupyter server, which might be doing fancier environment management. Becoming aware of higher order organizations makes the client overly complicated and brittle to upstream changes.
I usually recommend explicitly activating the virtualenv you wish to use before calling papermill so there's no confusion. In more complicated scenerios, setting env variables to override defaults (which is exactly how env activation works as well) can work. Some groups also make small wrapper CLIs to do custom setups for this purpose (e.g. {myorgname}-papermill
) to add any business logic in a way that's explicit to follow but doesn't require daily users to have to remember.
Hope that helps
Hi Matthew
Thanks for the reply and explainations. I reformulated my original question a bit so it can at least be used as search target if somebody encounters the same issue.
🐛 Bug
Sorry for the wall of text.
Intro
This is a bug / limitation in a complex scenario. I will try to describe exactly what happens, but I don't know the architecture of jupyter and papermill good enough to propose a fix. My hope is that somebody experienced in both can take a look at it. If it is not possible to fix in jupyter or papermill, maybe my findings will help somebody else that encounters this.
The fix I propose is: Try setting the environmental variable JUPYTER_PATH to the location of your compute kernels.
Description to Bug
We run a jupyter server from a venv, which has several kernels attached with their separate _venv_s. The kernels are installed with a prefix directory and not with the --user flag. I'll explain below why it works with the --user flag and hence probably for most people as --user is set in all tutorials.
In particular the venvs are in the following directories: /opt/pyenvs/jupyter <- jupyter server venv /opt/pyenvs/DSS01 <- compute kernel 1 venv /opt/pyenvs/DSS02 <- compute kernel 2 venv /opt/pyenvs/DSS03 <- compute kernel 3 venv
With this installation the kernelspecs are found in: /opt/pyenvs/jupyter/share/jupyter/kernels/datasciencestack_a.01
/opt/pyenvs/jupyter/share/jupyter/kernels/datasciencestack_a.02
/opt/pyenvs/jupyter/share/jupyter/kernels/datasciencestack_a.03
/opt/pyenvs/jupyter/share/jupyter/kernels/python3
This works for jupyter-lab to select and run kernels.
When I try to run a template notebook with papermill I get the following error:
What happens under the hood:
In kernelspec.py: First _get_kernel_spec(self, kernelname) with the intended kernelname gets called. This calls _find_spec_directory(kernelname.lower()). Which gets the kernel locations from _self.kerneldirs. And this should somehow be set in _ _kernel_dirsdefault(self), but I didn't find the link to this one.
Anyway at this point the kerneldirs in our case are set to:
However the kernel_dirs should contain
/opt/pyenvs/jupyter/share/jupyter/kernels
as this is where the actual specs are found.This first entry of _kerneldirs is the user directory which is always included and hence it will work when you use --user in the options when installing a kernel.
The paths are generated in
/opt/pyenvs/DSS03/lib/python3.10/site-packages/jupyter_core/paths.py -> jupyter_path
This method mentions that you can overwrite everything by setting JUPYTER_PATH. Setting JUPYTER_PATH fixes the NoSuchKernel issue.The question is, if this should be necessary. Can't papermill somehow detect the venv of the jupyter server and add its path accordingly? Or should this be done somewhere in juypter-core? Jupyter somehow should keep track where its kernels are installed.