Closed zyzhang1992 closed 1 year ago
The same procedure worked on two of the other clusters I have tested and it worked just fine on those two clusters.
Thanks for filing. Can you share the difference between the cluster it isn't work on vs those that it is working on? i.e. are they different architectures, etc?
They are quite similar in that they all run the same CentoOS. Two clusters are at Stanford with the same 2FA, one worked and the other doesn't. The third one worked, and it doesn't use 2FA. I just noticed that the one that doesn't work is intel. I'll do some more test later.
Here is the one that doesn't work,
@.*** login ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz Stepping: 1 CPU MHz: 1260.845 CPU max MHz: 3000.0000 CPU min MHz: 1200.0000 BogoMIPS: 4200.27 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 20480K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
Here are the ones that worked: @.*** ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 8 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7301 16-Core Processor Stepping: 2 CPU MHz: 2200.000 CPU max MHz: 2200.0000 CPU min MHz: 1200.0000 BogoMIPS: 4399.39 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0-3,32-35 NUMA node1 CPU(s): 4-7,36-39 NUMA node2 CPU(s): 8-11,40-43 NUMA node3 CPU(s): 12-15,44-47 NUMA node4 CPU(s): 16-19,48-51 NUMA node5 CPU(s): 20-23,52-55 NUMA node6 CPU(s): 24-27,56-59 NUMA node7 CPU(s): 28-31,60-63 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb hw_pstate ssbd rsb_ctxsw ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
@.*** ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 256 On-line CPU(s) list: 0-255 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 2 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7742 64-Core Processor Stepping: 0 CPU MHz: 3269.104 BogoMIPS: 4491.48 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 16384K NUMA node0 CPU(s): 0-63,128-191 NUMA node1 CPU(s): 64-127,192-255 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
From: Brigit Murtaugh @.> Sent: Tuesday, March 28, 2023 9:54 AM To: microsoft/vscode-remote-release @.> Cc: Zhiyong Zhang @.>; Author @.> Subject: Re: [microsoft/vscode-remote-release] code tunnel doesn't see to start VSCode server (Issue #8289)
The same procedure worked on two of the other clusters I have tested and it worked just fine on those two clusters.
Thanks for filing. Can you share the difference between the cluster it isn't work on vs those that it is working on? i.e. are they different architectures, etc?
— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/vscode-remote-release/issues/8289#issuecomment-1487281081, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADG474DKSOLWO4T3CZY6WFDW6MJUFANCNFSM6AAAAAAWKZSKIY. You are receiving this because you authored the thread.Message ID: @.***>
I did another test on an AMD node on the cluster that I am having problem with and it didn't work either, same situation.
[zyzhang@sh03-ln01 login ~]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 NUMA node(s): 8 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7502 32-Core Processor Stepping: 0 CPU MHz: 2495.394 BogoMIPS: 4990.78 Virtualization: AMD-V L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 16384K NUMA node0 CPU(s): 0-7,64-71 NUMA node1 CPU(s): 8-15,72-79 NUMA node2 CPU(s): 16-23,80-87 NUMA node3 CPU(s): 24-31,88-95 NUMA node4 CPU(s): 32-39,96-103 NUMA node5 CPU(s): 40-47,104-111 NUMA node6 CPU(s): 48-55,112-119 NUMA node7 CPU(s): 56-63,120-127 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca
I ran a strace of it and noticed the following,
brk(0x223a000) = 0x223a000 open("/home/users/zyzhang/.vscode-cli/code_tunnel.json", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) getuid() = 35637 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9 connect(9, {sa_family=AF_UNIX, sun_path="/run/user/35637/bus"}, 22) = -1 ENOENT (No such file or directory) close(9) = 0 open("/home/users/zyzhang/.vscode-cli/token.json", O_RDONLY|O_CLOEXEC) = 9 fcntl(9, F_SETFD, FD_CLOEXEC) = 0 fstat(9, {st_mode=S_IFREG|0644, st_size=142, ...}) = 0 lseek(9, 0, SEEK_CUR) = 0 read(9, "\"P0aUgAZAiyWGAghQVWg/RlYmrVtnfNo"..., 142) = 142 read(9, "", 32) = 0 close(9) = 0 uname({sysname="Linux", nodename="sh03-ln01.stanford.edu", ...}) = 0 readlink("/proc/self/exe", "/home/users/zyzhang/code", 256) = 24 brk(0x223b000) = 0x223b000 brk(0x223e000) = 0x223e000 futex(0x7f5d97828858, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0x7f5d97f321b8, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
From the logs you posted, it looks like there is some kind of network issue on the machine you're using
error sending request for url (https://global.rel.tunnels.api.visualstudio.com/api/v1/tunnels?global=true&tags=vscode-server-launcher&allTags=true): error trying to connect: tcp connect error: Operation timed out (os error 110)
or possibly DNS is not resolving that hostname correctly
Thanks @connor4312 !
Looks like there is problem starting the vscode-server, hence that error of connection to the vscode server? Does that connection refer to the connection the server running on the cluster?
In fact, I don't see any signs of starting any other processes when running code tunnel.
Could that be an issue with communications between github and that cluster?
Is there a way not to use the github authentication when starting/connecting with the tunnel?
The VS Code server is not started until a remote editor connect to the tunnel, because before that point we don't know what server version is needed. https://global.rel.tunnels.api.visualstudio.com
is the host that serves tunnel access; code tunnel
is not functional if it's unavailable.
Here is the output when it starts correctly. As can be seen there is also a request for starting a new connection https://usw3.rel.tunnels.api.visualstudio.com/ but when I click on that one, I also get the 404 error of 404 Not Found nginx.
When you mention "remote editor", did you mean the editor/vscode installed on my local machine? In that case, could there be configurations with my local vscode? I was assuming my local vscode should be fine since it worked with the other two clusters.
To resolve the host at https://global.rel.tunnels.api.visualstudio.com, is it the cluster on which I am running the code tunnel that will try to connect to it, possibly through the DNS service? With that assumption, what should I look for to trouble shoot for that?
Apologies for the ignorant questions. I may need to have a better understanding of the possible processes involved to be able to ask the right questions of people at our institution.
[2023-03-28 15:07:53] trace Found token in keyring [2023-03-28 15:07:53] debug [reqwest::connect] starting new connection: https://api.github.com/ [2023-03-28 15:07:53] debug [reqwest::connect] starting new connection: https://usw3.rel.tunnels.api.visualstudio.com/ [2023-03-28 15:07:54] debug Starting tunnel to server... [2023-03-28 15:07:54] trace Found token in keyring [2023-03-28 15:07:54] debug [tungstenite::handshake::client] Client handshake done. [2023-03-28 15:07:54] debug [russh::ssh_read] read_ssh_id: reading [2023-03-28 15:07:54] debug [russh::ssh_read] read 39 [2023-03-28 15:07:54] debug [russh::ssh_read] Ok("SSH-2.0-Microsoft.DevTunnels.Ssh_3.10\r\n") [2023-03-28 15:07:54] debug [russh::client] writing 352 bytes [2023-03-28 15:07:54] debug [russh::ssh_read] id 39 39 [2023-03-28 15:07:54] debug [russh::client::kex] extending [] [2023-03-28 15:07:54] debug [russh::client::kex] algo = Names { kex: Name("none"), key: Name("none"), cipher: Name("none"), client_mac: Name("none"), server_mac: Name("none"), server_compression: None, client_compression: None, ignore_guessed: false } [2023-03-28 15:07:54] debug [russh::client::kex] write = [] [2023-03-28 15:07:54] debug [russh::client::kex] i0 = 342 [2023-03-28 15:07:54] debug [russh::client::kex] moving to kexdhdone, exchange = Exchange { client_id: CryptoVec { p: 0x1959f60, size: 28, capacity: 32 }, server_id: CryptoVec { p: 0x1a21fe0, size: 37, capacity: 64 }, client_kex_init: CryptoVec { p: 0x19dec80, size: 342, capacity: 512 }, server_kex_init: CryptoVec { p: 0x1adf6e0, size: 94, capacity: 128 }, client_ephemeral: CryptoVec { p: 0x1, size: 0, capacity: 0 }, server_ephemeral: CryptoVec { p: 0x1, size: 0, capacity: 0 } } [2023-03-28 15:07:54] debug [tunnels::connections::relay_tunnel_host] established host relay primary session [2023-03-28 15:07:54] debug Connected to tunnel endpoint: TunnelRelayTunnelEndpoint { base: TunnelEndpoint { connection_mode: TunnelRelay, host_id: "8b543939-df0a-4623-8e0d-9638b9673ac5", host_public_keys: [], port_uri_format: Some("https://wfzscf15-{port}.usw3.devtunnels.ms/"), tunnel_uri: Some("https://wfzscf15.usw3.devtunnels.ms/"), port_ssh_command_format: Some("ssh wfzscf15-{port}@ssh.usw3.devtunnels.ms"), tunnel_ssh_command: Some("ssh wfzscf15@ssh.usw3.devtunnels.ms"), ssh_gateway_public_key: None }, host_relay_uri: Some("wss://usw3-data.rel.tunnels.api.visualstudio.com/api/v1/Host/Connect/wfzscf15"), client_relay_uri: Some("wss://usw3-data.rel.tunnels.api.visualstudio.com/api/v1/Client/Connect/wfzscf15") } [2023-03-28 15:07:54] trace Found token in keyring [2023-03-28 15:07:55] debug Visual Studio Code Server is listening for incoming connections
Open this link in your browser https://vscode.dev/tunnel/scg
[2023-03-28 15:08:54] debug [tunnels::connections::ws] sent liveness ping [2023-03-28 15:08:54] debug [tunnels::connections::ws] received liveness pong [2023-03-28 15:09:54] debug [tunnels::connections::ws] received liveness pong [2023-03-28 15:10:54] debug [tunnels::connections::ws] sent liveness ping [2023-03-28 15:10:54] debug [tunnels::connections::ws] received liveness pong
In that output, it looks like the tunnel was started up successfully.
When you mention "remote editor", did you mean the editor/vscode installed on my local machine? In that case, could there be configurations with my local vscode? I was assuming my local vscode should be fine since it worked with the other two clusters.
The VS Code server version must match the version of the client on the other end, so the VS Code server isn't downloaded until someone connect to the tunnel and tells the code tunnel
process what version it needs.
Thanks Connor. Yes in that case it was working fine. I included that as a comparison to the case which failed.
The difference is what happens after the following,
[2023-03-28 15:07:53] trace Found token in keyring [2023-03-28 15:07:53] debug [reqwest::connect] starting new connection: https://api.github.com/ [2023-03-28 15:07:53] debug [reqwest::connect] starting new connection: https://usw3.rel.tunnels.api.visualstudio.com/
In the case it failed, it just stuck here. As you pointed out, there may be DNS resolution issues. If I understand it correctly, it is the cluster I am on sending a connection request to https://usw3.rel.tunnels.api.visualstudio.com/ but couldn't connect to is. How do I trouble shoot this?
You aren't expected to open that URL directly. In the logs you posted, it showed the link you use to connect:
Open this link in your browser https://vscode.dev/tunnel/scg
What happens when you run the tunnel and go to that URL?
When I opened that link I am at the vscode webpage interface to the code tunnel created at the scg machine,
This is the case when it worked. While on the other cluster when it doesn't work, the last I can see is
[2023-03-28 15:07:53] debug [reqwest::connect] starting new connection: https://usw3.rel.tunnels.api.visualstudio.com/
and it doesn't proceed any further from here until it fails explicitly.
From the logs you posted, it looks like there is some kind of network issue on the machine you're using
error sending request for url (https://global.rel.tunnels.api.visualstudio.com/api/v1/tunnels?global=true&tags=vscode-server-launcher&allTags=true): error trying to connect: tcp connect error: Operation timed out (os error 110)
or possibly DNS is not resolving that hostname correctly
@connor4312 Is this what we should focus on to debug it? Is there any way to test the connection or the DNS in this situation?
Yea, I would first start by seeing if you can curl
that URL from the affected machine. If it's accessible it'll give you a 401 lacking auth, but the CLI is failing before it gets to that step
Thanks Connor.
I got the following
It keeps hanging until tunnel times out
What other tests can I do at this point?
Here is what I have on the cluster where it failed:
[zyzhang@sh02-ln02 login ~/vscode-test]$ curl -isv 'https://global.rel.tunnels.api.visualstudio.com/api/v1/tunnels?global=true&tags=vscode-server-launcher&allTags=true'
On the cluster where it worked, the 1st few lines are:
I'm not a networking in expert, and definitely not a networking expert for the environment you're running in. I would probably start by checking any firewall rules either on the machine or policies that might be applied to your network, e.g. in your cloud provider's console, if you use one.
It looks like this is not an issue on the VS Code side of things, so I will close this issue.
x
[``](,xl;,;l,asxsa\
)
Greetings,
I am running code tunnel on one of our clusters.
[zyzhang@sh02-ln02 login ~]$ ./code --version code-cli 1.75.1 (commit 441438abd1ac652551dbe4d408dfcec8a499b8bf)
When I run code tunnel, I can authenticate on github but then there doesn't seem to be a VSCode server started. After a while, it times out. When I follow the link, https://global.rel.tunnels.api.visualstudio.com/ I got an error of
404 Not Found nginx
[zyzhang@sh02-ln02 login ~]$ ./code tunnel --verbose *
[2023-03-28 09:23:35] debug No code server tunnel found, creating new one [2023-03-28 09:23:35] trace Found token in keyring [2023-03-28 09:23:35] debug [reqwest::connect] starting new connection: https://api.github.com/ [2023-03-28 09:23:35] debug github token looks expired: Ok(StatusError { url: "https://api.github.com/user", status_code: 401, body: "{\"message\":\"Bad credentials\",\"documentation_url\":\"https://docs.github.com/rest\"}" }) [2023-03-28 09:23:35] info error refreshing token: Refresh token not available, authentication is required [2023-03-28 09:23:35] debug [reqwest::connect] starting new connection: https://github.com/ To grant access to the server, please log into https://github.com/login/device and use code E9DD-75B1 [2023-03-28 09:23:40] trace refresh poll failed, retrying: Error getting authorization: authorization_pending The authorization request is still pending. [2023-03-28 09:23:45] trace refresh poll failed, retrying: Error getting authorization: authorization_pending The authorization request is still pending. [2023-03-28 09:23:51] trace refresh poll failed, retrying: Error getting authorization: authorization_pending The authorization request is still pending. [2023-03-28 09:23:56] trace refresh poll failed, retrying: Error getting authorization: authorization_pending The authorization request is still pending. [2023-03-28 09:24:01] debug [reqwest::connect] starting new connection: https://global.rel.tunnels.api.visualstudio.com/ [2023-03-28 09:23:56] trace refresh poll failed, retrying: Error getting authorization: authorization_pending The authorization request is still pending. [2023-03-28 09:24:01] debug [reqwest::connect] starting new connection: https://global.rel.tunnels.api.visualstudio.com/ [2023-03-28 09:26:08] error error listing current tunnels: connection error: error sending request for url (https://global.rel.tunnels.api.visualstudio.com/api/v1/tunnels?global=true&tags=vscode-server-launcher&allTags=true): error trying to connect: tcp connect error: Operation timed out (os error 110) [zyzhang@sh02-ln02 login ~]$
The same procedure worked on two of the other clusters I have tested and it worked just fine on those two clusters. Where should a look for hints for the possible issues?
Thanks!
@bamurtaugh