jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.58k stars 4.85k forks source link

keep notebook running after the browser tab closed #1647

Closed ibigquant closed 2 years ago

ibigquant commented 8 years ago

My experiment may run long time (hours). It seems the notebook stop running after the browser tab closed. How to keep it running and updating the notebook?

Carreau commented 8 years ago

My experiment may run long time (hours). It seems the notebook stop running after the browser tab closed. How to keep it running and updating the notebook?

Unfortunately there are no current simple way to do that. We are aware of the issue and working on it. In the meantime, I would suggest wrapping all the computation you are doing in Futures, in order to query for results only interactively.

CLosing as this is already tracked in many places, but feel free to continue asking questions.

ibigquant commented 8 years ago

Thanks for you reply, Carreau. I'm fresh to python notebook and not quite understand the "Futures" you mentioned. Could you give me a simple example? Great great thanks.

takluyver commented 8 years ago

A future is an object representing a task - it provides a way to see if the task is done, and get the result (or error) when it's finished. They're a general concept, but Python provides an implementation in concurrent.futures. They're normally used in code that's doing more than one thing at once.

I think that's probably more complex than you need, though. A cell that you've started running will keep going when you close the browser tab, but the output it produces is lost. The easiest workaround is just to leave the browser tab open - tabs are cheap, I've got ~50 open now. If you can't do that for some reason, make sure it assigns any results you want to keep to a variable - they should still be available when you open it again. You can also use the %capture magic to store printed output into a variable you can get later.

flaviostutz commented 7 years ago

I am struggling with this issue as well for some time now. The kernel keeps running your job on the server, but there is no way to see the console output after closing the browser.

My workaround was to write all my logs to a file, so that when my browser closes (indeed when a lot of logs come through browser it hangs up too) I can see the kernel job process by opening the log file (the log file can be open using Jupyter too).

    #!/usr/bin/python
    import time
    import datetime
    import logging

    logger = logging.getLogger()

    def setup_file_logger(log_file):
        hdlr = logging.FileHandler(log_file)
        formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
        hdlr.setFormatter(formatter)
        logger.addHandler(hdlr) 
        logger.setLevel(logging.INFO)

    def log(message):
        #outputs to Jupyter console
        print('{} {}'.format(datetime.datetime.now(), message))
        #outputs to file
        logger.info(message)

    setup_file_logger('out.log')

    for i in range(10000):
        log('Doing hard work here i=' + str(i))
        log('Taking a nap now...')
        time.sleep(1000)

+1 on this or some kind of long running process management

abalter commented 7 years ago

I'm confused why this is difficult. Since a serialized jupyter notebook contains cell output, it should be possible to keep track of output when a user closes a tab and returns to the notebook by adding to the notebook json as it runs in the background, in which case the output generated while running in the background would be in the notebook. Why can't jupyter just keep writing to the json file?

takluyver commented 7 years ago

It doesn't write to the JSON files as soon as output is sent by the kernel - it is sent to the browser, which adds it in to the notebook document. When you save it (or an autosave occurs), the notebook document is converted to JSON and written to disk as a whole.

We're planning to change that so that the server keeps the notebook model and sends updates to the browser, but that's a big change to the architecture.

abalter commented 7 years ago

That would be great! Is there an current issue or milestone where I can track the progress?

takluyver commented 7 years ago

I don't know of one - @Carreau might be able to give you more info on the progress.

abalter commented 7 years ago

That would be great! My group works on remote servers. So being able to reconnect to a session would be very valuable.

prolearner commented 7 years ago

I'm working on remote servers too. It would be really handy to be able to do this, hope it'll be implemented soon.

As a suggestion, I think that having the possibility to reconnect to a session even if that means to lose all the output when you weren't connected but having the possibility to save the new output would be great and more simple to implement. That way if you're working on a remote server and you have a network disconnection you can still continue the work with little loss.

Carreau commented 7 years ago

I don't know of one - @Carreau might be able to give you more info on the progress.

None AFAICT from the Notebook/Lab side. nteract might be closer with commuter. That's probably not going to be implemented "soon". Realtime will likely come sooner but will require a running browser.

flying-sheep commented 7 years ago

Closing as this is already tracked in many places, but feel free to continue asking questions.

so where are the open issues for this? there’s still issues being opened about this (e.g. #2446) and i can’t find the earlier, open ones.

k0pernicus commented 7 years ago

Any news about this issue please?

arvoelke commented 7 years ago

The easiest workaround is just to leave the browser tab open

This doesn't help if you are on a flaky connection to the server (e.g., accessing a remote jupyter server, or tunnelling to one through SSH).

Carreau commented 7 years ago

Any news about this issue please?

We are aware of the issues there is not much written about it – we should get a comprehensive document about that – but this need a significant reactor of the frontend, plus likely some change in the backend. CoCalc (ex SagemathCloud) does allow that but ou need a server side model, and basically you deprecate all the extension for a given frontend – which is easy for cocalc as it is without extensions.

Though it is indirectly moving forward via Jupyterlaband nteract Comutable, and once this is out we can likely start to think about an isomorphicJsApp that keep state and the browser being only a "view" on this state.

My personal opinion is that this can be done without changes to the protocol as a separate app and anyone is welcomed to chime in, and write up a IPep/RFC/Prototype that lays out the ground infrastructure.

It is a significant enough amount of work that we can't just do that "on the side", and will need at least a FTE to do that.

flying-sheep commented 7 years ago

likely some change in the backend

from my understanding, the frontend runs in the browser. so if no tab is open, there is no frontend and there definitely need to be changes in the backend. or do you mean different parts than me?

architecturally, i’d assume that the notebook server needs to start writing responses to the notebook file as long as there’s no browser tab attached. (i.e. instead of receiving the responses in a browser tab and manually saving the notebook, it gets saved automatically after any (batch of) responses)

Carreau commented 7 years ago

from my understanding, the frontend runs in the browser. so if no tab is open, there is no frontend and there definitely need to be changes in the backend. or do you mean different parts than me?

You need to move some pieces from frontend to backend. it likely can be done with a "proxy server" in between notebook server and browser.

architecturally, i’d assume that the notebook server needs to start writing responses to the notebook file as long as there’s no browser tab attached. (i.e. instead of receiving the responses in a browser tab and manually saving the notebook, it gets saved automatically after any (batch of) responses)

Yes and no. The notebook file does not – and cannot – store all the necessary information especially while the kernel is still running (for example mapping from message-id to handlers. You need an extra store (that can be in server RAM) and has a richer representation than ipynb. If you have that, then the frontend need to understand this as well which start o be complicated.

kostrykin commented 7 years ago

You need to move some pieces from frontend to backend. it likely can be done with a "proxy server" in between notebook server and browser.

@Carreau By "proxy server", you actually mean something like an off-screen browser, right? I'm quite not sure how the interaction of your actual browser and that off-screen-proxy-thing should look like. Do you have any knowledge of any peace of software which can do that? Maybe, a browser which itself renders its interface as HTML and provides it via HTTP?

Carreau commented 7 years ago

@Carreau By "proxy server", you actually mean something like an off-screen browser, right

No, not completely. Browser like imply HTML and rendering. You can store non-html models on the proxy-server-side. I only care about the ipynb+some-info on the server side. The rendering is a detail. The point is the "state" you care about – which is not the HTML rendering should live and be able to be updated without needing to have an open browser. Thing of Google Drive RT API if you wish.

I've seen things (ex mozilla tow truck I think) trying to do that with HTML. Any isomorphic App these days does similar things.

idning commented 6 years ago

do we have any update on this, this should be essential for the cloud use case.

set92 commented 6 years ago

If you check the 2nd reference, which says "This is an intended outcome of the notebook model refactor.". So we will get it in jupyterlab, althought reading it I'm thinking it will save the results but it would not let us open again a closed notebook to keep working on it, or check the results after letting it working in background.

idning commented 6 years ago

Is that any hack we can do ?

e.g. assign the output of each cell to a internal variable, and when we re-connect the kernel, get these variables and display them.

minrk commented 6 years ago

@idning yes, storing results and outputs in variables continues to work. You can redisplay variables still in memory at any time.

x = long_computation()
... some time later:
display(x)

You can also capture displayed outputs (not results) with the %%capture cell magic:

%%capture out_x
print("lots of stuff")
...
# another cell
out_x.show()

However, if it really is a long-running computation, avoiding recomputing even when the kernel dies is probably useful. In that case, using a caching scheme such that you write intermediate results to disk and only re-execute if the cache doesn't exist on disk is preferable. This is what I have done for my long-running notebooks long ago. In this way, re-running a whole notebook after it's run once, even with a new kernel, may only take a few seconds and will produce all of the original output. There are lots of ways to do this with different tradeoffs of rigor vs cache performance, which is part of why there isn't a simple example to link to.

Yet another option is to run the notebook entirely headless with nbconvert:

jupyter nbconvert --execute --to notebook mynotebook.ipynb

which will create a new notebook with all of the output in tact.

rasbt commented 6 years ago

I think the typical use case for that is when running longer computations in the notebook; here, it's important to keep in mind that nbconvert is not very generous with the default timeout limit per cell. E.g., for longer computations, one might want to provide a custom timeout limit, e.g., for computations that run for a day, something as follows:

jupyter nbconvert --execute --to notebook mynotebook.ipynb --ExecutePreprocessor.timeout=86400
abalter commented 6 years ago

I have a really hard time understanding why this is a problem. Basically, whatever would be sent to the browser is instead written to a file. When the user logs back in, send it to the browser.

takluyver commented 6 years ago

Something like that is now implemented - messages go into a buffer when there's no client connected, and are replayed when one reconnects. But the details are never as simple as they seem.

damianavila commented 6 years ago

Something like that is now implemented - messages go into a buffer when there's no client connected, and are replayed when one reconnects.

I have not seen any reference to this functionality in the docs, maybe we should "advertise" it a little more?

bernardelli commented 6 years ago

I am deeply interested in this functionality.

emacsenli666 commented 6 years ago

@takluyver ,that sound interesting,so when will this function be published? People working on remote cloud engine might need this eagerly.

takluyver commented 6 years ago

Should be there since 5.2 - added in PR #2871

lzfxxx commented 6 years ago

@takluyver In this PR, I can only find the demo for network drop-off, but keep notebook running after tab closing issue doesn't seem to be solved. I tested this in my notebook(version 5.4.0), I run the code bellow and then close the chrome tab, when i reopen the notebook, the unsaved changes were all gone including the latest output.

import time
for i in range(100):
    time.sleep(1)
    print(i)
rasbt commented 6 years ago

can confirm, have the same issue (also v5.4.0, Safari browser)

takluyver commented 6 years ago

From the server's point of view, I think network issues should have a similar result to closing and reopening a tab. @rgbkrk @minrk am I right in that? If so, I'm not sure why it wouldn't be working.

rgbkrk commented 6 years ago

The state of the notebook is entirely client side, which means that any outputs that come in that aren't captured in the document do not get saved back to disk.

What #2871 did was buffer any outputs (and other messages) until the user reconnects -- it will only help you for cases where you're reconnecting the same tab.

takluyver commented 6 years ago

Ah, so it doesn't work for closing a tab and reopening it later. I've been misleading people. Thanks Kyle!

abalter commented 6 years ago

@rgbkrk -- I feel like that was sort of a shutdown to all the people who want this feature. Furthermore, that may be your model of a notebook, but that is not mine, and clearly not a great many other people's either. Can you direct us to a mission statement or global definition of sorts that defines exactly what a notebook is supposed to be?

Back to the client side--the fact is, sometimes the client side needs to close their laptop and go home for the day. That should not prevent them from continuing their client-side work after an evening with their family, a good night's sleep, and a fresh cup of coffee in the morning.

Instead of using a notebook, we could start a screen session and run our job in a vanilla Python shell or an IPython shell. But then we lose the wonderful features that Jupyter has to offer.

rasbt commented 6 years ago

Ah, so it doesn't work for closing a tab and reopening it later. I've been misleading people. Thanks Kyle!

I misunderstood as well, good that that's cleared up now :)

@abalter Overall, I agree with you regarding issues with that use case scenario. The reason why I stumbled upon/looked for this GitHub issue is not that I want to close browser tabs but to keep things running when I e.g., temporarily have to close my laptop with a notebook session running on a different machine. Mainly, I use my laptop for most of my work because of the app ecosystem on macOS, but I also realize that it's not the greatest computing platform so that I run the code mainly on my Linux machine or HPC cluster. The reason why would prefer Jupyter notebooks vs python scripts for that is that I like to collect outputs and create plots of an sequential workflow all in one place. I don't want to sound too demanding though, since I really appreciate that Jupyter Notebook is open-source, free software by a non-profit organization, it would be nice though if such a feature would exist some time in future. Also, this feature may already supported then via

What #2871 did was buffer any outputs (and other messages) until the user reconnects -- it will only help you for cases where you're reconnecting the same tab.

I have to check :)

rgbkrk commented 6 years ago

I feel like that was sort of a shutdown to all the people who want this feature. Furthermore, that may be your model of a notebook, but that is not mine, and clearly not a great many other people's either. Can you direct us to a mission statement or global definition of sorts that defines exactly what a notebook is supposed to be?

Thanks for raising that back up. I'm not stating it as my ideal model either, more of a "this is how it works currently". This doesn't get to the true ideal, which is a server side model of the notebook that is synchronized to the frontends. I'm likely to work on that during the coming year, #2871 was a stop gap to help people in a basic way before we re-architecture. My opinion is that we can re-open this issue, which I'll do now.

rraallvv commented 6 years ago

Could someone please tell me whether this is related to colab.research.google.com? (formerly colaboratory.jupyter.org) As you might guess, I'm totally new to Jupyter Notebook, although I know Python. The thing is that I'd like to experiment with TensorFlow, but some tasks might run for hours. I was wondering then, if I could run the experiments online instead of locally on my machine. Thanks.

takluyver commented 6 years ago

Colaboratory is a separate project made by Google which uses Jupyter notebook files on Google drive. It appears that you can run Tensorflow on it. I don't know whether there are any limitations on how long a computation can run for - you'd have to ask Google about that.

rraallvv commented 6 years ago

@takluyver thanks for the info, it's very much appreciated.

diadochos commented 6 years ago

Anything like an output log could save much time (although not for real solution -- just for backup). Is there a way to dump all text contents sent to the client?

(Unfortunately I'm not familiar with how Jupyter works, but if someone could give me a brief instruction of how I could realize it, I can try to implement it myself)

takluyver commented 6 years ago

There's no option to dump it to a file at the moment. The code that buffers output when there's no browser connected is here:

https://github.com/jupyter/notebook/blob/faa0cab302bb86f0329a512a4ece1f772b29b4c7/notebook/services/kernels/kernelmanager.py#L170-L257

diadochos commented 6 years ago

@takluyver Wow, thank you! I'll try that when I get some time.

wernight commented 6 years ago

Having another machine keep the tab open also seems to allow getting some updates (supposing it auto-saves). Which also allows getting some progress information for long running tasks.

Could a virtual browser like PhantomJS be a (hacky) solution?

abalter commented 6 years ago

@wernight -- I don't think that's a hacky solution at all. It might be a really simple and direct approach. Jupyter runs in a browser, and if that browser just happens to be virtual, fine. This would just keep updating the JSON version of the notebook, and when you log back in, that file updates the browser. I'm having a hard time understanding why this is difficult.

wernight commented 6 years ago

SGTM. I've even a Dockerizer PhantomJS if you're interested: https://hub.docker.com/r/wernight/phantomjs/

abalter commented 5 years ago

@takluyver

Something like that is now implemented - messages go into a buffer when there's no client connected, and are replayed when one reconnects. But the details are never as simple as they seem.

I'm not trying to be obnoxious, but tell me where my thinking is wrong here:

Typical web application function:

  1. server receives request from client (client-side application)
  2. server creates response
  3. server sends response to client (essentially passes it a stream)
  4. loop until tab is closed

Hypothetical way jupyter web app functions:

  1. jupyter server receives request from client (jupyter notebook) due to user input
  2. jupyter server creates response (e.g. runs code)
    1. jupyter server sends response to client (essentially passes it a stream)
    2. client responds that message was received
    3. client displays output
    4. loop until computation finished
  3. loop until tab is closed

Suppose user does not interact with notebook

  1. jupyter server receives request from client (jupyter notebook)
  2. jupyter server creates response (e.g. runs code)
    1. jupyter server sends response to client (essentially passes it a stream)
    2. client responds that message was received
    3. client displays output
    4. loop until computation finished
  3. client responds that message was received

Suppose tab is currently closed

  1. jupyter server receives request from client (jupyter notebook)
  2. jupyter server creates response (e.g. runs code)
    1. jupyter server sends response to client (essentially passes it a stream) AND writes response to a file
    2. client responds that message was received
    3. client displays output
    4. loop until computation finished
  3. client responds that message was received

Suppose tab is reopened

  1. jupyter server send cached stream to notebook
  2. client responds that message was received
  3. client displays output
  4. jupyter server resumes normal operation

I can't emphasize enough how important this is to our workflow and that of many others.

This is a MAJOR shortcoming of Jupyter compared to RStudio Server and should be a top priority.

tanmay-kulkarni commented 5 years ago

This probably has been said several hundred times already, but once again, I wish to request the kind developers of this project to take this issue on priority.

It's baffling to me how such a basic necessity has not been taken care of for so long. I mean, most jobs with large amounts of data take several hours to run, at the least, on a remote server. I'd have thought this feature is included by default. I was surprised when I kept my server running overnight, logged in and saw that no output was stored. I even couldn't tell which cell was currently executing since all the cells had a blank instead of an * which is there when a cell is running.

EDIT: I'd like to add that I realize Jupyter is free software and the developers have other commitments too and only so much time, but I love Jupyter and this feature would make life easier for so many people. Thanks in advance ;)

Carreau commented 5 years ago

To the risk of also repeating ourselves one more time.

Jupyter is mostly developed by people on their free time, and is given away for free. We do suffer the same bugs and annoyance than you do. We prioritize what we can prioritize, and even for those of us who are allowed to contribute to Jupyter professionally, it is 1) not always their main occupation, 2) have often tasks that are assigned by management or higher ups.

We don't owe features to users, even if we do care, but we do have obligations to finish the projects for which non-profit that gave us money – at least for those of us employed totally or partially via these funds.

We cannot – and will not try – to force volunteers to prioritize what they wish to work on. We can try to lead by example and hope this foster collaboration.

It is not because this issue is still open that people are not working on it. We already added a band aid by replaying messages, and there is significant work that is currently done on this front, in part with JupyterLab with a server-side model and CRDT.

It is extremely difficult work, especially if you can't spend several hours focused on it, which not many of us can afford.

So if you wish for this work to go faster, please do not insult us, shout on us (or write bold on the internet which is equivalent) and find ways to help, even indirectly.

There are many ways you can do so even if you are not a genius coder:

Convince your company/institution/government to donate to numfocus

This will allow us to hire people to work full time with a correct living wages ! If we get even more money we could even hire talents that otherwise cross the street to get their salary double, triple or sometime more than quintupled.

Convince your company/institution/government to contribute time

Ask if you (or someone else) would be allowed to spend 1 afternoon per month helping. If Jupyter is used at your work, your company likely would gain in having an expert, and fixing things upstream. We also have plenty of things that are not code related where we need help (legal, design, event planning...),

Respond to issues on mailing list, help triage.

You will free us time ! Not having to respond to easy issue allow us to sometime get 1 or 2 hours strait where we can attempt difficult work.

Contribute code on your free time

Getting familiar with even small issues will increase your knowledge of the codebase, and who knows after a couple of month you may commit right and can help fix long standing issues like this one. You sometime don't even have to start from scratch, there are many PR that some of us started, but need to polish (fix test, rebase, document..), with the nice decentralized github you can propose fixes to existing PRs !

Help manage the community

Twitter, GitHub, Facebook, Youtube, mailing list, Proof-read our blog, being friendly and remind people to be respectful to each other.

We are sorry if you are encountering issues, or if you have lost work, but please don't us that as a excuse to suggest that we don't care, are incompetent, haven't thought about how to fix it, how to implement it, and how to not break backward compatibility.

Many thanks, and much love from the Jupyter team, doing their best.

Also nice reads are Why I took October off from OSS volunteering and Setting expectations for open source participation from Brett Cannon