jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.6k stars 4.86k forks source link

Kernel seems to ignore jupyter's option `--notebook-dir` #4924

Open atemate opened 4 years ago

atemate commented 4 years ago

My project setup (suppose my current directory is /project):

$ tree .
.
├── modules
│   └── my_code.py
├── notebooks
│   └── Untitled.ipynb
...

When I run jupyter notebook --notebook-dir=$(pwd) --debug, jupyter does run in the /project directory, but when I open the notebook notebooks/Untitled.ipynb in the GUI, the kernel starts in /project/notebooks directory but not in /project as I might expect following the documentation:

$ jupyter notebook --help | grep -A 2 notebook-dir
--notebook-dir=<Unicode> (NotebookApp.notebook_dir)
    Default: ''
    The directory to use for notebooks and kernels.

Log messages:

[I 11:57:22.720 NotebookApp] Serving notebooks from local directory: /project
...
[D 11:57:37.078 NotebookApp] Kernel args: {'kernel_name': 'python3', 'cwd': '/project/notebooks'}

I looked at all the config files of both jupyter and the IPython kernel itself, and I could not find the place where the kernel is instructed to work in the ipynb's location.

Questions:

  1. Is it a bug, that IPython kernel does not run in the directory specified by jupyter notebook --notebook-dir=...?
  2. If no, is there a setting of IPython kernel that instructs it to work with ipynb file in the working directory that is the location of this file?

My environment:

$ jupyter --version
jupyter core     : 4.5.0
jupyter-notebook : 6.0.1
qtconsole        : 4.5.4
ipython          : 7.7.0
ipykernel        : 5.1.2
jupyter client   : 5.3.1
jupyter lab      : not installed
nbconvert        : 5.6.0
ipywidgets       : 7.5.1
nbformat         : 4.4.0
traitlets        : 4.3.2

$ python --version
Python 3.7.4

$ uname -a
Linux archlinux 5.2.13-arch1-1-ARCH #1 SMP PREEMPT Fri Sep 6 17:52:33 UTC 2019 x86_64 GNU/Linux
kevin-bates commented 4 years ago

This appears to be by design. The cwd is based on the value of path - which will correspond to the directory relative to the "root" (i.e., notebook-dir). So in your example, path='notebooks' and cwd is equated to the fully qualified path relative to notebook-dir + '/notebooks'. Then, during the kernel's launch, cwd is used for the cwd of the subprocess (i.e., the kernel).

I don't know what kinds of assumptions are made by the kernel and its location relative to the notebook (I suspect none). You might try adding a configuration option to "pin-to-notebook-dir" such that the code I highlighted above takes that into consideration and uses the empty path ('') if the option is set. Again, I don't know what kinds of ramifications would come into play if the notebook does not reside in the current working directory of the kernel process, but this seems like worth looking into.

atemate commented 4 years ago

Thank you @kevin-bates for pointing out to the place where the parameter path that interests me is used. Now I am trying to understand how to add a configuration option to "pin-to-notebook-dir" as you proposed.

  1. I went deeper into the code of the projects jupyter/jupyter_client and jupyter/notebook and found out that server-side, parameter path is set from the json_body of the client's request (I suppose, this is relevant swagger documentation).
  2. Also, I tracked the client-side method that you pointed out up to the call tree, and I my track ends on method start_new_kernel, which is not called within this project and seems to be exposed.

So my question remains:

  1. Which client-side config allows to specify the option path when the client sends POST request to the server to launch a new kernel?
  2. Is it possible to set it to None in POST request? Is it safe?
kevin-bates commented 4 years ago

@ayushkovskiy - in the case for Notebook, the launch_kernel is called from start_kernel but (eventually) from the MappingKernelManager (as noted in the previous comment), not start_new_kernel(). The latter is a library method used for just starting kernels. It does not come into play in the grand scheme of Notebook/Lab operations - where --notebook-dir is configured.

The origination of path comes from the POST (patch actually) request comes in the SessionHandler. The start_kernel_for_session then starts the process of calling start_kernel() method within the Kernel Manager class hierarchy.

I think the primary thing to change for this would be to not adjust cwd within the notebook-dir hierarchy. The notebook-dir option is probably not reflective of what it really is. It's more of a root-dir[*] since notebooks can reside in sub-directories of that directory. However, I believe the general idea is that notebook-dir acts as a container or sandbox for operations.

So, I was imagining that if this option were enabled, the behavior you'd want is to retain cwd as if the notebook where actually in notebook-dir (i.e., the "root"). As a result, you'd use a value of path as '' (the empty string) - since that's what is used for path when the notebook file resides in notebook-dir.

I was thinking you'd have something like the following (again at the location mentioned in the previous comment)...

            if path is not None:
                if self.rooted_operation:
                    kwargs['cwd'] = self.cwd_for_path('')
                else:
                    kwargs['cwd'] = self.cwd_for_path(path)

This way, only the kernel launch is affected. You don't want to side affect the content service that uses path to manage the actual notebook file.

Is it safe?

I suspect so, but there may be something I'm missing. You'd need to check things out relative to the functionality you desire, then others might chime in now or during the review.

Regarding the parameter name, I prefer something like "rooted operation" over "pin to notebook dir", but that kind of thing can get hashed out during a review.

I hope that helps.

[*] In jupyter_server notebook-dir is actually renamed to root-dir.