microsoft / vscode-tools-for-ai

Azure Machine Learning for Visual Studio Code, previously called Visual Studio Code Tools for AI, is an extension to easily build, train, and deploy machine learning models to the cloud or the edge with Azure Machine Learning service.
Other
321 stars 91 forks source link

Connection to compute instance fails because vscode-server cannot be started #2219

Closed Crosswind closed 8 months ago

Crosswind commented 8 months ago

Expected Behavior

I can use VS Code on desktop to connect to a compute instance to execute my Jupyter notebooks while working in a local-like environment.

Actual Behavior

When trying to connect to a compute instance with a local VS Code installation (desktop) it fails after a couple of attempts with the following message:

Maximum retries exhausted. Could not install VS Code server on : Downloading VS Code server failed. Please try again later.

It doesn't matter whether I initiate the connection from the browser or from VS Code directly. The "Kill VS Code server and retry" button doesn't improve the situation.

I used the integrated terminal and can see that the vscode-server.tar.gz in the .vscode-server directory has 0 bytes. It appears as if the transfer of the binary is not possible. I haven't found any logs that could help narrow down the problem.

(azureml_py38) azureuser@<compute-instance-name>:~/.vscode-server/bin/f1b07bd25dfad64b0167beb15359ae573aecd2cc$ ls -al
total 8
drwxr-xr-x 2 azureuser azureuser 4096 Nov  1 16:37 .
drwxr-xr-x 3 azureuser azureuser 4096 Nov  1 16:11 ..
-rw-r--r-- 1 azureuser azureuser    0 Nov  1 16:37 vscode-server.tar.gz

When I delete the directory above it is recreated on the next attempt, but the archive is again 0 bytes in size.

Steps to Reproduce the Problem

  1. Use an existing ML workspace with a (new) compute instance
  2. Setup VS Code to connect to Azure and make sure the ML workspace and the compute instance show up
  3. Make sure the compute instance is started
  4. Open the context menu of the compute instance and hit "Connect"

Specifications

shsuman commented 8 months ago

@Crosswind Can you please provide me with Azure ML Remote logs Trace logs ?

You can follow the steps here

Crosswind commented 8 months ago

Thanks, that pointed me to the error:

Acquiring lock on /home/azureuser/.vscode-server/bin/f1b07bd25dfad64b0167beb15359ae573aecd2cc/vscode-remote-lock.azureuser.f1b07bd25dfad64b0167beb15359ae573aecd2cc
Installing to /home/azureuser/.vscode-server/bin/f1b07bd25dfad64b0167beb15359ae573aecd2cc...
350b76d3-cc6e-4a5a-b502-1d437d3701d7%%1%%
Downloading with wget
wget download failed
ERROR: cannot verify az764295.vo.msecnd.net's certificate, issued by ‘emailAddress=support@fortinet.com,CN=FGVM16TM22000571,OU=Certificate Authority,O=Fortinet,L=Sunnyvale,ST=California,C=US’: Unable to locally verify the issuer's authority. To connect to az764295.vo.msecnd.net insecurely, use `--no-check-certificate'.

There is obviously more in the log file, but I think these lines are the important ones. That also explains why the tar file has 0 bytes. It cannot be downloaded due to a certificate issue. It's a customer environment and the traffic appears to be routed through a proxy which messes with the certificate. We have had that problem with PyPI packages as well and ended up using that customer's internal mirror. How can we work this out? I assume that setting --no-check-certificate is not recommended and perhaps difficult to accomplish even. Do you require the entire log file or is this enough for know?

shsuman commented 8 months ago

@Crosswind Please take a look at the list of endpoints that you will need access to here.

You will need to make sure that the Compute instance has access to all those endpoints. As for the certificate issue, I think you would need to talk to your network administrator to allow these endpoints to go through

Also, we do not allow a user to set the --no-check-certificate option while making a connection

shsuman commented 8 months ago

Closing because of inactivity. Please reopen if need be :)