Open DonJayamanne opened 3 years ago
Is there a milestone issue to see the progress of the update?
Unfortunately this issue has not yet been prioritized at our end, please do vote on this issue though
What do you suggest as a workaround if one wants to run long 10+ hours sessions using Jupyter notebooks in vscode when connected to a remote kernel over SSH (using vscode remote extension)? after some hours the connection gets disconnected and there is no way to see the progress or output of running cells.
@matifali unfortunately at this stage we have no work around for this, let me see if i can get an udpate within a week.
@matifali I'm trying to understand your expectations, hence the following questions
I ask this because the easiest thing to get working is:
Thanks
- Would you expect to see the numbers 1, 2, 3, 4 and then slowly the number going up to 100 while vscode is open (as the execution is still in progress)
I would prefer this output as my use case is to train deep learning models and its better if we can see the full history.
Assume you have opened vscode after a few hours and you know all 100 would have been printed out and vscode was closed. Would you expect to see all 1, 2, ... 100 in the output or just expect to be able to connect to the kernel and see the fact that execution has completed
This is preferred,
Or would you expect to see 1, 80, 81, 82 and then the number will keep going up while vscode is open (as the execution is still in progress)
This is also OK but the problem is vscode is unable to connect to a running remote kernel and show any outputs. Yes, the process is running but we do not see anything printed. There is no indication if the losses are actually decreasing.
https://github.com/matifali please could you provide a simple notebook that we can use for testing purposes to ensure we have a simple sample close to real world scenario
it could be a simple training model to make things simple I’d like to see what kind of out put you are using and the structure of the notebook
if possible I’d really appreciate a simple notebook without any external dependencies other than putting packages (ie without csv or other files)
once again thanks for going back with the details
https://github.com/matifali please could you provide a simple notebook that we can use for testing purposes to ensure we have a simple sample close to real world scenario
it could be a simple training model to make things simple I’d like to see what kind of out put you are using and the structure of the notebook
if possible I’d really appreciate a simple notebook without any external dependencies other than putting packages (ie without csv or other files)
once again thanks for going back with the details
I have made this simple toy notebook that trains a DNN classifier with randomly generated data. I have tried to replicate the essence of a real ML scientist/engineer's workflow. There are no external dependencies other than the necessary packages, which can be installed with the following commands: pip install tensorflow pip install numpy pip install scikit-learn
The structure of the notebook follows is a standard format for training ML models:
The last cell is the most important for testing the reconnection mechanisms, as this is the part where the training loop is run and the result is displayed. You will see the number of epochs, the loss and the accuracy of the model being printed as the training progresses. I have defined a very high number of epochs so that you have plenty of time to test the reconnection mechanisms even if the training has not yet been completed. Ideally, we would like to see the complete training history (all the lines that are printed when the last cell is run).
For my use cases, model training can take days, even weeks, and what I have found is that I cannot leave this kind of notebook running and exit VS Code because otherwise the process dies immediately when I close the window. Allowing the process to keep running in the background is a necessary first step for the reconnect mechanism to make sense to ML scientists/engineers, especially laptop users like me.
You can find the notebook in the following repository: https://github.com/RYSKZ/Toy-DNN-Training
Please let me know if you have any issues or need further clarification.
@DonJayamanne, the above notebook seems a good fit for the test.
bumping. any movement on this?
https://github.com/matifali please could you provide a simple notebook that we can use for testing purposes to ensure we have a simple sample close to real world scenario it could be a simple training model to make things simple I’d like to see what kind of out put you are using and the structure of the notebook if possible I’d really appreciate a simple notebook without any external dependencies other than putting packages (ie without csv or other files) once again thanks for going back with the details
I have made this simple toy notebook that trains a DNN classifier with randomly generated data. I have tried to replicate the essence of a real ML scientist/engineer's workflow. There are no external dependencies other than the necessary packages, which can be installed with the following commands: pip install tensorflow pip install numpy pip install scikit-learn
The structure of the notebook follows is a standard format for training ML models:
- Importing necessary packages.
- Loading and processing (generating, in this case) the data.
- Defining the model architecture.
- Training and validation of the model.
The last cell is the most important for testing the reconnection mechanisms, as this is the part where the training loop is run and the result is displayed. You will see the number of epochs, the loss and the accuracy of the model being printed as the training progresses. I have defined a very high number of epochs so that you have plenty of time to test the reconnection mechanisms even if the training has not yet been completed. Ideally, we would like to see the complete training history (all the lines that are printed when the last cell is run).
For my use cases, model training can take days, even weeks, and what I have found is that I cannot leave this kind of notebook running and exit VS Code because otherwise the process dies immediately when I close the window. Allowing the process to keep running in the background is a necessary first step for the reconnect mechanism to make sense to ML scientists/engineers, especially laptop users like me.
You can find the notebook in the following repository: https://github.com/RYSKZ/Toy-DNN-Training
Please let me know if you have any issues or need further clarification.
@DonJayamanne You may use this notebook for testing.
@matifali I'm trying to understand your expectations, hence the following questions
- Assume you have 1 cell *Code in this cell prints numbers from 1 to 100, printing a number every hour
- Assume you run this cell and saw the number 1 printed out.
- Now you run this cell, and close vscode and come back tomorrow and open vscode and open this asme notebook
- Would you expect to see the numbers 1, 2, 3, 4 and then slowly the number going up to 100 while vscode is open (as the execution is still in progress)
- Or would you expect to see 1, 80, 81, 82 and then the number will keep going up while vscode is open (as the execution is still in progress)
- Assume you have opened vscode after a few hours and you know all 100 would have been printed out and vscode was closed. Would you expect to see all 1, 2, ... 100 in the output or just expect to be able to connect to the kernel and see the fact that execution has completed
I ask this because the easiest thing to get working is:
- if the cell is still running then we display 1, 80, 81, 82 (where 1 was from the first instance of vscode and 80, 81 and so on after vscode is opened again. I.e. all of the output generated while vscode was closed will not becaptured and not stored in teh notebook)
- I.e. we will only allow connecting to a kernel and you can see whether exeuction has comlpeted {or not, and if it is still going on then the data will be appened to what was stored previously
Thanks
the fundamental issue here is that jupyter server shows the available "running kernels" that can be reconnected to, and vscode doesn't. you could get around the complexities of expected behavior wrt specific cell outputs if you just made the already-running kernels visible to the user somehow.
concretely: i have a GPU equipped workstation and use it to run image generation notebooks, often from my laptop connected VIA vscode's "ssh remote" functionality. new images appear in the cell output as they are generated, but they are also written to disk (on the workstation). if the screen on my laptop goes to sleep, vscode prompts me to re-enter the password for my remote and responds by creating a new jupyter session. the old session is still running, as evidenced by outputs continuing to be written to disk and ps aux
showing the old jupyter PID still there and consuming lots of resources (to be clear: vscode sometimes kills the running session after I start a new one, but this behavior seems inconsistent and i often either leave the background job to completion or sigkill it manually myself to regain visibility of outputs). as a user, I should be able to pick the existing, running kernel from the "select kernel" drop down, but it is not available. this is a basic jupyter feature and it should not be difficult to expose it. it would be nice if vscode "intelligently" reconnected itself, but right now there's literally no option to reconnect to the old kernel at all, automagically or manually. vs code just needs to expose visibility on the already running kernels it's managing, rather than only listing the kinds of kernels it's capable of initiating
@DonJayamanne
Any update on this?
As far as I understand it is not possible to start running a jupyter notebook on a remote machine via the VSCode SSH extension, disconnect from the SSH tunnel and come back to the notebook still running.
I have tried with tmux but i dont find a way to have the jupyter notebook show up on VSCode after reattaching to the running tmux session.
Anyone could give a hand?
+1
I'd heavily rely on this feature. Any updates on this? Or viable workarounds?
@bbantal
As a workaround, I have succeeded in running my own jupyter server process and connecting to that as a "remote" kernel (running on the same host). As long as the jupyter server process is running the state of your kernel is persisted across VS Code restarts.
@bbantal
As a workaround, I have succeeded in running my own jupyter server process and connecting to that as a "remote" kernel (running on the same host). As long as the jupyter server process is running the state of your kernel is persisted across VS Code restarts.
By your own jupyter server do you mean a second jupyter server that you run on your local machine? As in "remote jupyter server" -> "local jupyter server" -> "local VS code session"?
@jrich100
@bbantal
We run this process (on the same machine where VS Code is running). Then, when selecting a kernel in VS Code, you can choose to connect to a remote jupyter server. Here you can specify the URL generated by the notebook process
@bbantal
We run this process (on the same machine where VS Code is running). Then, when selecting a kernel in VS Code, you can choose to connect to a remote jupyter server. Here you can specify the URL generated by the notebook process
@jrich100
Unclear to me how my desired remote jupyter server is involved in your solution. What am I missing? I want to connect to a remote (not local!) jupyter server from my local VS code and I want to keep the kernel on that remote server alive so that I can reconnect to it whenever and access my previously created variables. The issue is currently that kernel dies whenever I close VS code.
The issue is currently that kernel dies whenever I close VS code.
This should not happen, if it does its a bug,
I think by I want to connect to a remote
you mean you are connecting to the remote server with VS Code over SSH or the like, is that correct?
If thats the case, then yes the kernels will die when VS Code is closed.
I think by
I want to connect to a remote
you mean you are connecting to the remote server with VS Code over SSH or the like, is that correct? If thats the case, then yes the kernels will die when VS Code is closed.
@DonJayamanne
Yes, that's exactly what I was trying to articulate! Ideally, the kernel wouldn't die and I could just reconnect to it whenever as long as it's kept running on the remote server. This feature would be immensely useful to me, and from what I can tell, to many others as well. Hence why I wondered if there were any updates, or alternatively a temporary workaround.
It feature would be very useful for many of users. Because it is just simply common sense, that If the ssh connection is closed for some reason, we want to able to after reconnect have the same state of the kernel and cells, cause even after reloading VS Code or just reconnecting ssh I can just lose all of my work and code that I made in the cells, because the kernel went down and I forgot to do ctrl+s every 5 minutes.
I think it is not so difficult - just create some kernel in a remote fashion that is not relying on the current ssh connection, and after reloading ssh or entire vs code just propose to choose existing running kernels.
Another requirement for this https://github.com/microsoft/vscode-jupyter/issues/14446#issuecomment-1757045873
I have to say it should be a crucial feature for visual studio code now. Currently losing connection to the remote tunnels means losing all of your work/progress makes it hard to do almost all important work.
I'd love this. This is my biggest pain point with vscode.
One more thing to note:
In practice, many of us are running/testing/benchmarking research code, whose various levels of maintenance (I pulled a python 2 repo the other day) mean that project-specific dev containers are pretty common.
The upshot is that the remote kernel for any given notebook is running inside the dev container for that project so that it can make use of the relevant environment.
This results in the following workflow:
I don't know if that makes implementing this insanely important feature more or less complicated....
Last of all thanks @DonJayamanne (and everyone else) for your awesome work making vscode better every day for python!
Here I am having some similar work scenario like @mkarikom. I have to deal with some nasty python environments whose setup might only be possible via container (which is quite common in academia), which results that I can only use remote kernels. But for now the pylance support for remote kernel is broken so the dev experience is not optimal.
I used to mount the container image and point the python extension interpreter path setting to the interpreter inside the container mount. but now this is impossible as python interpreter path setting can influnce the behaviour of jupyter extension is considered a bug and has been fixed.
I'd like to bump this issue. For me this is a breaking feature, and I use jupyterlab over vscode for this reason, despite vscode having a better linter, copilot, and better vim keybindings; I suspect many people who have any kind of remote data science/machine learning workflow feel similarly. I have had this issue for the past 2 years, but only just found this thread.
For what it's worth, I am willing to volunteer to help address this. I am not sure what the policy is for accepting pull requests from those outside the core team, but I thought I'd put that out there.
+1, this is a breaking feature for anyone doing research and quantitative work where we need to rapidly experiment until we find what works well so that we can port it into a standalone script.
Maybe most people are already aware of this workaround, but here's what I do:
tmux
on thatipython
inside that tmux
session+1 for this
From https://xkcd.com/2881/ ... We'll have a fantastic trip full of machine learners, research engineers, and data scientists 😅
IIUC if I want to have a ipykernel running remotely, I have two choices:
Extending the second approach should be straightforward
Wanted to give my +1 every way possible.
+1 to this issue
+1 having this would be great!
+1 to this issue
+1 Is there any initiative to start this feature?
+1 to this issue
It would be nice if, overall, the language server didn't die when restarting vscode. It's not just notebooks
+1 to this issue.
Consider supporting the new Jupyter kernel API to allow server side execution to continue with disconnected clients, and for clients to pull updates when they reconnect.
https://github.com/jupyterlab/jupyterlab/issues/2833
-- Randy
On Tue, Apr 9, 2024 at 4:16 PM Jason L Causey @.***> wrote:
+1 to this issue.
— Reply to this email directly, view it on GitHub https://github.com/microsoft/vscode-jupyter/issues/3998#issuecomment-2045979311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACPJ44HL72OZJB5EYHVZITY4REBHAVCNFSM4T5KLQE2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBUGU4TOOJTGEYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
+1 to this issue.
+1
+1
+1
+1
+1
+1
+1, and echoing above points that this is a breaking issue, and the number 1 pain point for me developing on VS code.
Problem
Local
Remote
Investigation Running Server & JupyterLab API for extensibility Goals:
Planned (related) Prototypes
Technical details
Manages kernels & sessions
Expose kernel socket connection over this connection (we already have the code/technology for this) - proxy socket (dummy kernel in UI layer, by creating a dummy socket connection)
Security - how do we secure this web server (will need to be addressed, but i'm leaving that for later)
Also related https://github.com/microsoft/vscode-jupyter/issues/300