DonJayamanne commented 3 years ago

Problem

Local

User starts a notebook, the kernel is now running on the local machine
Assume the computer goes to sleep,
After a while if we go back into the notebook, the Notebook is unable to re-connect to the same kernel (kernel state is lost)
Similarly if user re-loads VS Code, the notebook is unable to re-connect to the same kernel (kernel state is lost)
Similarly if a user is using Remote SSH, and connection is reset and user re-connects and opens the same notebook, then the user is unable to re-connect to the same kernel (kernel state is lost)

Remote

User opens a notebook and runs a cell against a remote kernel
Similarly if user re-loads VS Code (or vscode.dev), the notebook is unable to re-connect to the same kernel (kernel state is lost)

Investigation Running Server & JupyterLab API for extensibility Goals:

Long running kernels
- User can open a notebook with a cell thats still running & see the output being generated
- Same as User can open notebooks related to kernels that are still running.
Extensibility for extension authors This is a by-product of the long running kernels (i.e. you get this for free - almost)

Planned (related) Prototypes

Long running kernels Solve problems related to kernel/session being lost due to :
- VS Code Shutdown
- VS Code Restart
- Computer sleeping
- SSH connection issues
- AML Compute will benefit
By product of extensiblity
- Julia Widgets (i might end up doing this first, might be easier)
- IPyWidget outside notebooks
- Variable viewer using the new api
- Data Frame viewer using the new api

Technical details

Server Background process
Manages kernels & sessions
Expose kernel socket connection over this connection (we already have the code/technology for this) - proxy socket (dummy kernel in UI layer, by creating a dummy socket connection)
Security - how do we secure this web server (will need to be addressed, but i'm leaving that for later)
Expose Jupyter extension extensibility over Jupyter Lab API I wont be exposing a connection, instead will just expose the SessionManager, KernelManager & other class instances from extension API

Also related https://github.com/microsoft/vscode-jupyter/issues/300

matifali commented 1 year ago

Is there a milestone issue to see the progress of the update?

DonJayamanne commented 1 year ago

Unfortunately this issue has not yet been prioritized at our end, please do vote on this issue though

matifali commented 1 year ago

What do you suggest as a workaround if one wants to run long 10+ hours sessions using Jupyter notebooks in vscode when connected to a remote kernel over SSH (using vscode remote extension)? after some hours the connection gets disconnected and there is no way to see the progress or output of running cells.

DonJayamanne commented 1 year ago

@matifali unfortunately at this stage we have no work around for this, let me see if i can get an udpate within a week.

DonJayamanne commented 1 year ago

@matifali I'm trying to understand your expectations, hence the following questions

Assume you have 1 cell *Code in this cell prints numbers from 1 to 100, printing a number every hour
Assume you run this cell and saw the number 1 printed out.
Now you run this cell, and close vscode and come back tomorrow and open vscode and open this asme notebook
Would you expect to see the numbers 1, 2, 3, 4 and then slowly the number going up to 100 while vscode is open (as the execution is still in progress)
Or would you expect to see 1, 80, 81, 82 and then the number will keep going up while vscode is open (as the execution is still in progress)
Assume you have opened vscode after a few hours and you know all 100 would have been printed out and vscode was closed. Would you expect to see all 1, 2, ... 100 in the output or just expect to be able to connect to the kernel and see the fact that execution has completed

I ask this because the easiest thing to get working is:

if the cell is still running then we display 1, 80, 81, 82 (where 1 was from the first instance of vscode and 80, 81 and so on after vscode is opened again. I.e. all of the output generated while vscode was closed will not becaptured and not stored in teh notebook)
I.e. we will only allow connecting to a kernel and you can see whether exeuction has comlpeted {or not, and if it is still going on then the data will be appened to what was stored previously

Thanks

matifali commented 1 year ago

Would you expect to see the numbers 1, 2, 3, 4 and then slowly the number going up to 100 while vscode is open (as the execution is still in progress)

I would prefer this output as my use case is to train deep learning models and its better if we can see the full history.

Assume you have opened vscode after a few hours and you know all 100 would have been printed out and vscode was closed. Would you expect to see all 1, 2, ... 100 in the output or just expect to be able to connect to the kernel and see the fact that execution has completed

This is preferred,

Or would you expect to see 1, 80, 81, 82 and then the number will keep going up while vscode is open (as the execution is still in progress)

This is also OK but the problem is vscode is unable to connect to a running remote kernel and show any outputs. Yes, the process is running but we do not see anything printed. There is no indication if the losses are actually decreasing.

DonJayamanne commented 1 year ago

https://github.com/matifali please could you provide a simple notebook that we can use for testing purposes to ensure we have a simple sample close to real world scenario

it could be a simple training model to make things simple I’d like to see what kind of out put you are using and the structure of the notebook

if possible I’d really appreciate a simple notebook without any external dependencies other than putting packages (ie without csv or other files)

once again thanks for going back with the details

FaintWhisper commented 1 year ago

https://github.com/matifali please could you provide a simple notebook that we can use for testing purposes to ensure we have a simple sample close to real world scenario

it could be a simple training model to make things simple I’d like to see what kind of out put you are using and the structure of the notebook

if possible I’d really appreciate a simple notebook without any external dependencies other than putting packages (ie without csv or other files)

once again thanks for going back with the details

I have made this simple toy notebook that trains a DNN classifier with randomly generated data. I have tried to replicate the essence of a real ML scientist/engineer's workflow. There are no external dependencies other than the necessary packages, which can be installed with the following commands: pip install tensorflow pip install numpy pip install scikit-learn

The structure of the notebook follows is a standard format for training ML models:

Importing necessary packages.
Loading and processing (generating, in this case) the data.
Defining the model architecture.
Training and validation of the model.

The last cell is the most important for testing the reconnection mechanisms, as this is the part where the training loop is run and the result is displayed. You will see the number of epochs, the loss and the accuracy of the model being printed as the training progresses. I have defined a very high number of epochs so that you have plenty of time to test the reconnection mechanisms even if the training has not yet been completed. Ideally, we would like to see the complete training history (all the lines that are printed when the last cell is run).

For my use cases, model training can take days, even weeks, and what I have found is that I cannot leave this kind of notebook running and exit VS Code because otherwise the process dies immediately when I close the window. Allowing the process to keep running in the background is a necessary first step for the reconnect mechanism to make sense to ML scientists/engineers, especially laptop users like me.

You can find the notebook in the following repository: https://github.com/RYSKZ/Toy-DNN-Training

Please let me know if you have any issues or need further clarification.

matifali commented 1 year ago

@DonJayamanne, the above notebook seems a good fit for the test.

dmarx commented 1 year ago

bumping. any movement on this?

matifali commented 1 year ago

https://github.com/matifali please could you provide a simple notebook that we can use for testing purposes to ensure we have a simple sample close to real world scenario it could be a simple training model to make things simple I’d like to see what kind of out put you are using and the structure of the notebook if possible I’d really appreciate a simple notebook without any external dependencies other than putting packages (ie without csv or other files) once again thanks for going back with the details

I have made this simple toy notebook that trains a DNN classifier with randomly generated data. I have tried to replicate the essence of a real ML scientist/engineer's workflow. There are no external dependencies other than the necessary packages, which can be installed with the following commands: pip install tensorflow pip install numpy pip install scikit-learn

The structure of the notebook follows is a standard format for training ML models:

Importing necessary packages.

Loading and processing (generating, in this case) the data.

Defining the model architecture.

Training and validation of the model.

The last cell is the most important for testing the reconnection mechanisms, as this is the part where the training loop is run and the result is displayed. You will see the number of epochs, the loss and the accuracy of the model being printed as the training progresses. I have defined a very high number of epochs so that you have plenty of time to test the reconnection mechanisms even if the training has not yet been completed. Ideally, we would like to see the complete training history (all the lines that are printed when the last cell is run).

For my use cases, model training can take days, even weeks, and what I have found is that I cannot leave this kind of notebook running and exit VS Code because otherwise the process dies immediately when I close the window. Allowing the process to keep running in the background is a necessary first step for the reconnect mechanism to make sense to ML scientists/engineers, especially laptop users like me.

You can find the notebook in the following repository: https://github.com/RYSKZ/Toy-DNN-Training

Please let me know if you have any issues or need further clarification.

@DonJayamanne You may use this notebook for testing.

dmarx commented 1 year ago

@matifali I'm trying to understand your expectations, hence the following questions

Assume you have 1 cell *Code in this cell prints numbers from 1 to 100, printing a number every hour

Assume you run this cell and saw the number 1 printed out.

Now you run this cell, and close vscode and come back tomorrow and open vscode and open this asme notebook

Would you expect to see the numbers 1, 2, 3, 4 and then slowly the number going up to 100 while vscode is open (as the execution is still in progress)

Or would you expect to see 1, 80, 81, 82 and then the number will keep going up while vscode is open (as the execution is still in progress)

Assume you have opened vscode after a few hours and you know all 100 would have been printed out and vscode was closed. Would you expect to see all 1, 2, ... 100 in the output or just expect to be able to connect to the kernel and see the fact that execution has completed

I ask this because the easiest thing to get working is:

if the cell is still running then we display 1, 80, 81, 82 (where 1 was from the first instance of vscode and 80, 81 and so on after vscode is opened again. I.e. all of the output generated while vscode was closed will not becaptured and not stored in teh notebook)

I.e. we will only allow connecting to a kernel and you can see whether exeuction has comlpeted {or not, and if it is still going on then the data will be appened to what was stored previously

Thanks

the fundamental issue here is that jupyter server shows the available "running kernels" that can be reconnected to, and vscode doesn't. you could get around the complexities of expected behavior wrt specific cell outputs if you just made the already-running kernels visible to the user somehow.

concretely: i have a GPU equipped workstation and use it to run image generation notebooks, often from my laptop connected VIA vscode's "ssh remote" functionality. new images appear in the cell output as they are generated, but they are also written to disk (on the workstation). if the screen on my laptop goes to sleep, vscode prompts me to re-enter the password for my remote and responds by creating a new jupyter session. the old session is still running, as evidenced by outputs continuing to be written to disk and ps aux showing the old jupyter PID still there and consuming lots of resources (to be clear: vscode sometimes kills the running session after I start a new one, but this behavior seems inconsistent and i often either leave the background job to completion or sigkill it manually myself to regain visibility of outputs). as a user, I should be able to pick the existing, running kernel from the "select kernel" drop down, but it is not available. this is a basic jupyter feature and it should not be difficult to expose it. it would be nice if vscode "intelligently" reconnected itself, but right now there's literally no option to reconnect to the old kernel at all, automagically or manually. vs code just needs to expose visibility on the already running kernels it's managing, rather than only listing the kinds of kernels it's capable of initiating

@DonJayamanne

marcoBmota8 commented 1 year ago

Any update on this?

As far as I understand it is not possible to start running a jupyter notebook on a remote machine via the VSCode SSH extension, disconnect from the SSH tunnel and come back to the notebook still running.

I have tried with tmux but i dont find a way to have the jupyter notebook show up on VSCode after reattaching to the running tmux session.

Anyone could give a hand?

andreimargeloiu commented 1 year ago

+1

bbantal commented 1 year ago

I'd heavily rely on this feature. Any updates on this? Or viable workarounds?

jrich100 commented 1 year ago

@bbantal

As a workaround, I have succeeded in running my own jupyter server process and connecting to that as a "remote" kernel (running on the same host). As long as the jupyter server process is running the state of your kernel is persisted across VS Code restarts.

bbantal commented 1 year ago

@bbantal

As a workaround, I have succeeded in running my own jupyter server process and connecting to that as a "remote" kernel (running on the same host). As long as the jupyter server process is running the state of your kernel is persisted across VS Code restarts.

By your own jupyter server do you mean a second jupyter server that you run on your local machine? As in "remote jupyter server" -> "local jupyter server" -> "local VS code session"?

@jrich100

jrich100 commented 1 year ago

@bbantal

We run this process (on the same machine where VS Code is running). Then, when selecting a kernel in VS Code, you can choose to connect to a remote jupyter server. Here you can specify the URL generated by the notebook process

bbantal commented 1 year ago

@bbantal

We run this process (on the same machine where VS Code is running). Then, when selecting a kernel in VS Code, you can choose to connect to a remote jupyter server. Here you can specify the URL generated by the notebook process

@jrich100

Unclear to me how my desired remote jupyter server is involved in your solution. What am I missing? I want to connect to a remote (not local!) jupyter server from my local VS code and I want to keep the kernel on that remote server alive so that I can reconnect to it whenever and access my previously created variables. The issue is currently that kernel dies whenever I close VS code.

DonJayamanne commented 1 year ago

The issue is currently that kernel dies whenever I close VS code.

This should not happen, if it does its a bug, I think by I want to connect to a remote you mean you are connecting to the remote server with VS Code over SSH or the like, is that correct? If thats the case, then yes the kernels will die when VS Code is closed.

bbantal commented 1 year ago

I think by I want to connect to a remote you mean you are connecting to the remote server with VS Code over SSH or the like, is that correct? If thats the case, then yes the kernels will die when VS Code is closed.

@DonJayamanne

Yes, that's exactly what I was trying to articulate! Ideally, the kernel wouldn't die and I could just reconnect to it whenever as long as it's kept running on the remote server. This feature would be immensely useful to me, and from what I can tell, to many others as well. Hence why I wondered if there were any updates, or alternatively a temporary workaround.

metya commented 1 year ago

It feature would be very useful for many of users. Because it is just simply common sense, that If the ssh connection is closed for some reason, we want to able to after reconnect have the same state of the kernel and cells, cause even after reloading VS Code or just reconnecting ssh I can just lose all of my work and code that I made in the cells, because the kernel went down and I forgot to do ctrl+s every 5 minutes.

I think it is not so difficult - just create some kernel in a remote fashion that is not relying on the current ssh connection, and after reloading ssh or entire vs code just propose to choose existing running kernels.

DonJayamanne commented 11 months ago

Another requirement for this https://github.com/microsoft/vscode-jupyter/issues/14446#issuecomment-1757045873

AnakinShieh commented 10 months ago

I have to say it should be a crucial feature for visual studio code now. Currently losing connection to the remote tunnels means losing all of your work/progress makes it hard to do almost all important work.

rsargent commented 9 months ago

I'd love this. This is my biggest pain point with vscode.

mkarikom commented 9 months ago

One more thing to note:

In practice, many of us are running/testing/benchmarking research code, whose various levels of maintenance (I pulled a python 2 repo the other day) mean that project-specific dev containers are pretty common.

The upshot is that the remote kernel for any given notebook is running inside the dev container for that project so that it can make use of the relevant environment.

This results in the following workflow:

set up a project in a dev container on some workstation or possibly hpc allocation
open up laptop and remote-ssh to the workstation
open project folder in container
open notebook.ipynb in project
start kernel in that project environment
be able to connect and reconnect to that kernel as above

I don't know if that makes implementing this insanely important feature more or less complicated....

Last of all thanks @DonJayamanne (and everyone else) for your awesome work making vscode better every day for python!

TTTPOB commented 9 months ago

Here I am having some similar work scenario like @mkarikom. I have to deal with some nasty python environments whose setup might only be possible via container (which is quite common in academia), which results that I can only use remote kernels. But for now the pylance support for remote kernel is broken so the dev experience is not optimal.

I used to mount the container image and point the python extension interpreter path setting to the interpreter inside the container mount. but now this is impossible as python interpreter path setting can influnce the behaviour of jupyter extension is considered a bug and has been fixed.

ando600 commented 8 months ago

I'd like to bump this issue. For me this is a breaking feature, and I use jupyterlab over vscode for this reason, despite vscode having a better linter, copilot, and better vim keybindings; I suspect many people who have any kind of remote data science/machine learning workflow feel similarly. I have had this issue for the past 2 years, but only just found this thread.

For what it's worth, I am willing to volunteer to help address this. I am not sure what the policy is for accepting pull requests from those outside the core team, but I thought I'd put that out there.

andreimargeloiu commented 8 months ago

+1, this is a breaking feature for anyone doing research and quantitative work where we need to rapidly experiment until we find what works well so that we can port it into a standalone script.

man-shu commented 7 months ago

Maybe most people are already aware of this workaround, but here's what I do:

open a bash terminal session on the remote machine
run tmux on that
run ipython inside that tmux session

andytwigg commented 7 months ago

+1 for this

jucor commented 7 months ago

From https://xkcd.com/2881/ ... We'll have a fantastic trip full of machine learners, research engineers, and data scientists 😅

Abrackadabra commented 6 months ago

IIUC if I want to have a ipykernel running remotely, I have two choices:

Run the kernel through a remote jupyter instance, lose pylance functionality because it does not support remote kernels
Connect as a Remote, run the kernel through VSCode itself by pointing it to an interpreter. Then kernel state does not survive a VSCode window reload.

Extending the second approach should be straightforward

HeatPhoenix commented 6 months ago

Wanted to give my +1 every way possible.

arkaprava08 commented 6 months ago

+1 to this issue

chirpio76 commented 6 months ago

+1 having this would be great!

skambha6 commented 6 months ago

+1 to this issue

han-steve commented 6 months ago

+1 Is there any initiative to start this feature?

doronbl commented 5 months ago

+1 to this issue

aflag commented 5 months ago

It would be nice if, overall, the language server didn't die when restarting vscode. It's not just notebooks

jasoncausey commented 5 months ago

+1 to this issue.

rsargent commented 4 months ago

Consider supporting the new Jupyter kernel API to allow server side execution to continue with disconnected clients, and for clients to pull updates when they reconnect.

https://github.com/jupyterlab/jupyterlab/issues/2833

-- Randy

On Tue, Apr 9, 2024 at 4:16 PM Jason L Causey @.***> wrote:

+1 to this issue.

— Reply to this email directly, view it on GitHub https://github.com/microsoft/vscode-jupyter/issues/3998#issuecomment-2045979311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACPJ44HL72OZJB5EYHVZITY4REBHAVCNFSM4T5KLQE2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBUGU4TOOJTGEYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>