Open PickHub opened 8 months ago
@PickHub thanks for filing this issue.
Thanks for getting back @sevillal!
ERR_CONNECTION_TIMED_OUT
Additionally we spun up a VM in our VNET. Connecting to the job via SSH from that VM fails with:
Traceback (most recent call last):
File "/home/azureuser/.azure/cliextensions/ml/azext_mlv2/manual/custom/_ssh_connector.py", line 118, in <module>
SshConnector().connect_ssh()
File "/home/azureuser/.azure/cliextensions/ml/azext_mlv2/manual/custom/_ssh_connector.py", line 49, in connect_ssh
loop.run_until_complete(self._connect_ssh())
File "/opt/az/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/azureuser/.azure/cliextensions/ml/azext_mlv2/manual/custom/_ssh_connector.py", line 63, in _connect_ssh
async with websockets.client.connect(
File "/home/azureuser/.local/lib/python3.10/site-packages/websockets/legacy/client.py", line 629, in __aenter__
return await self
File "/home/azureuser/.local/lib/python3.10/site-packages/websockets/legacy/client.py", line 647, in __await_impl_timeout__
return await self.__await_impl__()
File "/home/azureuser/.local/lib/python3.10/site-packages/websockets/legacy/client.py", line 654, in __await_impl__
await protocol.handshake(
File "/home/azureuser/.local/lib/python3.10/site-packages/websockets/legacy/client.py", line 325, in handshake
raise InvalidStatusCode(status_code, response_headers)
websockets.exceptions.InvalidStatusCode: server rejected WebSocket connection: HTTP 403
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
Thanks @PickHub for your response.
Our workspace has public network access enabled from all networks. We do have a private endpoint connection setup. Is that going to be a problem? It should not be a problem, public network access should allow Jupyter and other services from anywhere.
I have some follow up questions:
@sevillal Thanks again for looking into this!
Traceback (most recent call last):
File "/home/azureuser/.azure/cliextensions/ml/azext_mlv2/manual/custom/_ssh_connector.py", line 118, in <module>
SshConnector().connect_ssh()
File "/home/azureuser/.azure/cliextensions/ml/azext_mlv2/manual/custom/_ssh_connector.py", line 49, in connect_ssh
loop.run_until_complete(self._connect_ssh())
File "/opt/az/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/azureuser/.azure/cliextensions/ml/azext_mlv2/manual/custom/_ssh_connector.py", line 63, in _connect_ssh
async with websockets.client.connect(
File "/home/azureuser/.local/lib/python3.10/site-packages/websockets/legacy/client.py", line 629, in __aenter__
return await self
File "/home/azureuser/.local/lib/python3.10/site-packages/websockets/legacy/client.py", line 647, in __await_impl_timeout__
return await self.__await_impl__()
File "/home/azureuser/.local/lib/python3.10/site-packages/websockets/legacy/client.py", line 654, in __await_impl__
await protocol.handshake(
File "/home/azureuser/.local/lib/python3.10/site-packages/websockets/legacy/client.py", line 325, in handshake
raise InvalidStatusCode(status_code, response_headers)
websockets.exceptions.InvalidStatusCode: server rejected WebSocket connection: HTTP 403
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
Would I be able to run VS Code from the shell that I'm using to ssh into our VM?
@PickHub thank for those details.
Are you open to having a triage call next? Please let me know your availability and I can schedule sometime.
Hey @sevillal, sorry about the late response, I've been on vacation. Would Monday or Wednesday 8 or 9am PST work for you?
Hey @PickHub , no worries, I hope you've had a great time. I've schedule sometime for next week.
Closing due to inactivity, please reopen if needed.
I'm now trying this with a Windows VM, with the following setting disabled in vscode:
But now the debug app is "not started":
@sevillal Could we re-open this, please.
@PickHub reopening this issue. I have a question, is your job a multi-node job? Are you able to connect to a job running in a single node?
Thanks for reopening @sevillal🙏 This is happening when running on a single Standard_D13_v2 (8 cores, 56 GB RAM, 400 GB disk)
compute. Which would mean a single node, correct?
@PickHub I think that's just the compute, do you mind sharing the YAML file you are using for starting the job?
Does this occur consistently? Yes
Repro steps:
Expected behavior: vscode connects to the running job Actual behavior:
Failed to connect to the remote extension host server (Error: request to https://<redacted>.eastus.nodes.azureml.ms:8889/api/terminals?1698344869411 failed, reason: connect ETIMEDOUT <some_ip>:8889)
I'm following Debug jobs and monitor training progress to attach the vscode debugger to an AML job. I'd appreciate any help to fix this Timeout error and getting this to run.
The error appears after opening a new vscode window under "Debug and monitor" in the AML portal. It's displaying
Installing VS Code server on <JOB_NAME>
for a while before the error.Error Message
Action: Resolver.resolve Error type: 70 Error Message: request to redacted:url failed, reason: connect ETIMEDOUT redacted:id
Version: 0.36.0 OS: darwin OS Release: 23.0.0 Product: Visual Studio Code Product Version: 1.83.1 Language: en
Call Stack
``` s extension.js:2:1985921 extension.js:2:2012910extension.js:2:2012910 ```Code
This is the Dockerfile for creating the Environment: