Identify notebook file being run

aggFTW commented 8 years ago

Hi,

I've seen this type of question a lot: http://stackoverflow.com/questions/20050927/how-to-get-the-ipython-notebook-title-associated-with-the-currently-running-ipyt?rq=1

It makes sense to me that the kernel should not know what it's talking to from a design perspective.

However, I'm currently in the process of working through a Jupyter High Availability scenario. Our goal is to have two Jupyter instances running in two different VMs and switch them if one of those two VMs go down for some reason without losing the kernel state.

We have control over the kernels we are running (see https://github.com/jupyter-incubator/sparkmagic/blob/master/remotespark/wrapperkernel/sparkkernelbase.py), and we'd like to be able to tie some state (a session number) to a particular kernel instance.

It seems to me like I'd need some things to achieve this, but maybe you have better ideas:

Fire some piece of code automatically every time a notebook starts: this could be the __init__ method in my kernel or some other piece of code that is triggered every time a kernel gets started (some Javascript code in the notebook maybe? I know this wouldn't apply for other clients but it's a start).
This previous bit of code that gets fired would need to always be run with the same ID to be able to identify the state it needs to reconstruct (i.e. it would need to know that for this particular kernel we had X particular state).
Some persistent storage that both Jupyter instances could have access to.

I thought of a concrete implementation and I'd like to hear some feedback on it if possible: There is a Notebook extension that reads some ID in the notebook's page DOM (I need help knowing what ID this would be: e.g. notebook name with relative paths from root folder included or a GUID in some hidden cell in the notebook file), which would then issue a request to the kernel with this ID to restore its state. The kernel would then take this ID and get the session ID from cloud storage. If the ID is embedded in Javascript, both Jupyter servers would need to trust the notebook from the get go.

Thanks for any help or pointers you may have! (cc. @msftristew, @MohamedElKamhawy, @ellisonbg)

aggFTW commented 8 years ago

cc @Carreau and @jdfreder

minrk commented 8 years ago

A custom KernelManager could add an environment variable when a kernel is started, though the KernelManager doesn't have access to the notebook path. A SessionManager could pass that down, though it wouldn't be updated when the notebook is renamed, so a filename is probably not the best key to use.

jdfreder commented 8 years ago

You can put a GUID in the notebook-level metadata. I think you can do it without JS, at the web server level, on new or existing notebook load.

jdfreder commented 8 years ago

--- oh, this is issue #1000 ! :cake: :tada:

Carreau commented 8 years ago

:-P

Carreau commented 8 years ago

Wouldn't a custom MappingKernelManager that store the various kernel-models in a shared DB we enough ? (or I miss something about the notebook name).

It is highly unlikely that the notebook would be renamed during the swap of VMs.

There might need some extra logic for clean startup/exit/restart, but that should be able to resume connections.

msftristew commented 8 years ago

So, I've picked up this work where @aggFTW left off. I think this is how we're thinking about doing this:

Use a custom SessionManager that passes down the notebook name as an argument to the MappingKernelManager.
Use a custom KernelManager that communicates the notebook name to the new kernel process on startup (through an environment variable or some other method).
Our custom kernels will take the notebook name as a key and will update their metadata as appropriate in the way that @aggFTW described above.
Use a custom ContentsManager to update the metadata necessary for resuming stale sessions when a method is renamed.

Item (4) will certainly be an internal extension to Jupyter for us, but we were wondering whether items (1) and (2) would have any chance of being accepted upstream. I understand that the kernel not knowing what's talking to it is part of the design, but it seems like it would be generally useful (not just for this scenario) if kernels could be made aware what the name of their notebook is either through an environment variable, a command-line argument, or a 0mq message. Do you suppose there would be any interest in that PR?

minrk commented 8 years ago

I think it is generally useful, and we should probably do it. An environment variable is the way to go, I think. The only disadvantage of that is that you cannot update the file location on rename after the kernel has started, but a zmq message updating the file doesn't seem like the right thing to do, to me.

olgabot commented 7 years ago

Was this ever resolved? I'm making output and figure folders based off of the name of the notebooks and this code works in the notebooks, but when I

from IPython.core.display import Javascript
from IPython.display import display

def get_notebook_name():
    """Returns the name of the current notebook as a string

    From From https://mail.scipy.org/pipermail/ipython-dev/2014-June/014096.html
    """
    display(Javascript('IPython.notebook.kernel.execute("theNotebook = " + \
    "\'"+IPython.notebook.notebook_name+"\'");'))
    return theNotebook

But when I move it into a common.py file so it can be accessed across all notebooks, I get a NameError:

Is this because the .py file has no notebook? Is there a way to get the .py file to recognize the notebook it is being called from?

Carreau commented 7 years ago

display(Javascript('IPython.notebook.kernel.execute("theNotebook = " + \
"\'"+IPython.notebook.notebook_name+"\'");'))
## Here are dragons. 
return theNotebook

Handwaving:

The display javascript will take some time to reach the browser, and it will take some time execute the JS and get back to the kernel.

During this time IPython have have to continue executing code, so try to "return theNotebook" which is undefined. So it raise. even if you could "Wait for the JS to execute" you could not set the name of the notebook before returning the function .

Does that make some sens ?

takluyver commented 7 years ago

The JS sets the name in the main user namespace. When the function is moved into a module, it's looking in the module namespace, so it never sees that name. But that function is a hack, and I wouldn't rely on it in any case.

natbusa commented 7 years ago

ok, maybe this would sound silly, but would it be enough to add the ipynb filename in the metadata section of the notebook data structure when it's read? the field should not be stored in file but only updated once read in memory. - a sort of ephemeral metadata info

natbusa commented 7 years ago

I see it looks like the kernel is completely agnostic to the concept of file and it just processes cells data. I would say that the only options are indeed env variables or passing the filename during the creation of the kernel if any filename is available at that point.

jordansamuels commented 7 years ago

I may be late to the party, but if we could somehow determine just the port of the notebook server, then getting the notebook path is easy by using the REST api. The example below hardwires port 8080:

kernel_id = re.search('kernel-(.*).json', ipykernel.connect.get_connection_file()).group(1)
response = requests.get('http://127.0.0.1:{port}/api/sessions'.format(port=8080))
matching = [s for s in json.loads(response.text) if s['kernel']['id'] == kernel_id]
if matching:
    return matching[0]['notebook']['path']

But I couldn't find any way to automatically determine the port, without using the not-so-safe/useful Javascript hacks.

So, can we get the port?

gcbeltramini commented 6 years ago

This seems to work:

import json
import os.path
import re
import ipykernel
import requests

#try:  # Python 3
#    from urllib.parse import urljoin
#except ImportError:  # Python 2
#    from urlparse import urljoin

# Alternative that works for both Python 2 and 3:
from requests.compat import urljoin

try:  # Python 3 (see Edit2 below for why this may not work in Python 2)
    from notebook.notebookapp import list_running_servers
except ImportError:  # Python 2
    import warnings
    from IPython.utils.shimmodule import ShimWarning
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=ShimWarning)
        from IPython.html.notebookapp import list_running_servers

def get_notebook_name():
    """
    Return the full path of the jupyter notebook.
    """
    kernel_id = re.search('kernel-(.*).json',
                          ipykernel.connect.get_connection_file()).group(1)
    servers = list_running_servers()
    for ss in servers:
        response = requests.get(urljoin(ss['url'], 'api/sessions'),
                                params={'token': ss.get('token', '')})
        for nn in json.loads(response.text):
            if nn['kernel']['id'] == kernel_id:
                relative_path = nn['notebook']['path']
                return os.path.join(ss['notebook_dir'], relative_path)

You can put it inside a module, and import it in the jupyter notebook.

Edit: Thanks to @thesneaker, I changed the way to get the token. Edit2: I tested in Python 2, but the Jupyter notebook couldn't import from notebook.notebookapp import list_running_servers when it was inside a module. Edit3: Added an alternative and an observation thanks to this comment.

References:

thesneaker commented 6 years ago

Thanks @gcbeltramini for this pure python solution! I'm running Jupyter 4.1.0 and had to take care of the missing token key. Other than that it's the best solution I've come across so far!

I wouldn't mind if this functionality would find it's way into the notebookapp class and be the recommended way by the jupyter devs. Having easy access to the notebook name (and preferably the path) is essential to do reproducible measurements with jupyter notebooks.

vpillac commented 6 years ago

Not quite sure why but the response was not always json for me, I fixed it by adding a try statement:

        try:
            for nn in json.loads(response.text):
                if nn['kernel']['id'] == kernel_id:
                    relative_path = nn['notebook']['path']
                    return os.path.join(ss['notebook_dir'], relative_path)
        except:
            pass

vpillac commented 6 years ago

Also another useful method:

def save_notebook_to_html():
    nb_name = get_notebook_name()
    s = os.system('jupyter nbconvert --to html {notebook}'.format(notebook=nb_name))
    return s == 0

jakirkham commented 6 years ago

This code...


try:  # Python 3
    from urllib.parse import urljoin
except ImportError:  # Python 2
    from urlparse import urljoin

try:  # Python 3
    from notebook.notebookapp import list_running_servers
except ImportError:  # Python 2
    import warnings
    from IPython.utils.shimmodule import ShimWarning
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=ShimWarning)
        from IPython.html.notebookapp import list_running_servers

...can be replaced with this code and still work on Python 2/3.

from requests.compat import urljoin

from notebook.notebookapp import list_running_servers

dclong commented 6 years ago

The code doesn't work for me in JupyterHub.

convoliution commented 5 years ago

Note that if you do not have the right token to query the server on the REST call,

json.loads(response.text)

may return {"message": "Forbidden", "reason": null} instead of a list of sessions, resulting in

if nn['kernel']['id'] == kernel_id:

raising TypeError: string indices must be integers

DBCerigo commented 5 years ago

Note that the solution above won't work when executing a nb via jupyter nbconvert --to notebook --execute mynotebook.ipynb or via from nbconvert.preprocessors import ExecutePreprocessor from within a python script, as (of course?!) there's no server running to query.

elgalu commented 5 years ago

How to achieve this with the latest versions?

billallen256 commented 4 years ago

Could the ipyparams package work for this? It can return the notebook file name as well as any query string parameters passed in the URL.

elgalu commented 4 years ago

It's seems to be unreliable @gershwinlabs , sometimes ipyparams.raw_url comes back as an empty string, seems to be related to the reliance on JavaScript, some sort of race condition.

billallen256 commented 4 years ago

@elgalu I can't seem to reproduce the problem. Can you tell me more about your environment and notebook? I don't think it's possible to get away from the reliance on Javascript given the deliberate separation between the front and back ends.

thorade commented 4 years ago

billallen256 commented 4 years ago

Thanks @thorade. I posted an answer with ipyparams.

jakirkham commented 4 years ago

Maybe issues with ipyparams can be raised against that repo? 😉

Ismar11 commented 4 years ago

Does anyone know if there is a command line argument under jupyter notebook list or a similar feature to get notebook names running in each server from console directly?

If it doesn't exist, it's not planned or the question is out of the scope of this issue, I could open a new one and describe in detail with examples/ideas. Let me know :)

cono commented 2 years ago

This looks hackish to me:

    kernel_id = re.search('kernel-(.*).json',
                          ipykernel.connect.get_connection_file()).group(1)

is there any simpler way to get id?

Was trying to look into the code, and coulnd't find where id is in Kernel. connection_file created as os.getpid():

    def init_connection_file(self):
        if not self.connection_file:
            self.connection_file = "kernel-%s.json"%os.getpid()
        try:
            self.connection_file = filefind(self.connection_file, ['.', self.connection_dir])
        except OSError:

Or probably I'm looking into the wrong place. Any suggestions?

jupyter / notebook

Identify notebook file being run #1000