microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
16.95k stars 798 forks source link

SSH timeouts since switching to WSL2 #5787

Open mattyindustries opened 3 years ago

mattyindustries commented 3 years ago

Environment

Microsoft Windows [Version 10.0.19041.450]
Release:        9.13
Linux version 4.19.104-microsoft-standard (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Wed Feb 19 06:37:35 UTC 2020

Steps to reproduce

Since uplifting my existing WSL instance to WSL2 I am having regular SSH timeouts when connecting to external Linux servers. It is often when I have entered a command such as ls, cat, vi where a large amount of data is going to be returned.

The terminal is unresponsive for a number of minutes before it finally returns the following error.

[id@servername modules] (mat_demo)$ ls
Timeout, server servername not responding.

I am able to open up a second WSL2 terminal and immediately connect to the same external linux server without issue.

I'm not able to consistently replicate the behaviour but I'll try my best to capture some WSL logs.

WSL logs:

Expected behavior

SSH connections operate without issue

Actual behavior

SSH connections timeout

ec-max commented 3 years ago

Seems like the same kind of issue/error I've experienced with conflicting routes due to VPN client.

https://github.com/microsoft/WSL/issues/5764

mattyindustries commented 3 years ago

Hi #5764 seems like it would prevent all network traffic whereas in my issue it works initially but then times out. I'm not sure if this affects other WSL2 traffic because SSH is my primary use case of WSL to jump onto my works server fleet.

As soon as a SSH session starts to timeout I'm able to immediately establish another SSH session with a second WSL2 window.

I've had to switch back to WSL1 and not had a repeat of the issue since I've done that.

Please let me know what you need me to do to assist with the need-repro label.

Mat I've ha

ikvirsingh commented 3 years ago

I'm getting the same issue after upgrading to WSL2 yesterday on one of my machines. I'm on Windows 10 Version 2004 Build 19041.508 I've currently tried the following on Ubuntu 18.04, Ubuntu 20.04 and also Pengwin:

  1. SSH into an existing external linux server
  2. Once the connection has been established, as soon as I start typing a command like ls or grep, the terminal window freezes and I cannot type anymore until it times out. This never happened on WSL1.

For me this is not an intermittent problem. It happens every time.

lvpx commented 3 years ago

I am facing a similar issue since upgrading to WSL2.


Edition         Windows 10 Pro
Version         20H2
Installed on    ‎09-‎07-‎2020
OS build    19042.610
Experience  Windows Feature Experience Pack 120.2212.31.0

The SSH window just freezes. I tried using both MobaXterm and Powershell SSH client.

commaaander commented 3 years ago

same here

dpiow commented 3 years ago

same here, WSL1 works, WSL2 freezes on debug1: pledge: network after entering credentials 20H2, was the same on a backported WSL on 1909

wbattou commented 3 years ago

Hello, I have the same issue on debian wsl2:

Windows 10 Pro 1909 18363.1256

Thank you for your help.

menih commented 3 years ago

Same here. I really don't like this idle timeout feature. And there is no way to control it! I've added timeout in .wslconfig but no affect!

pnunn commented 3 years ago

Getting this to all my ssh connections from Ubuntu on wsl2. The same connection an a "real linux" laptop show no issues at all.

daelmaak commented 3 years ago

I have somewhat similar issue with WSL2 Ubuntu 18. It happens to me almost every time when I run certain nodejs script which fires many HTTP requests to remote server. 1-2 initial runs are fine, but since then timeouts begin to occur.

gradinarot commented 3 years ago

Any news on this?

pnunn commented 3 years ago

Nothing has changed.

gradinarot commented 3 years ago

While we are waiting for a fix you can use this https://github.com/gradinarot/wsl-vpnkit

Another solution is to change your ISP. For example, I live in Austria and this only happens with T-home, all other providers are ok. Even when I use the hotspot from my phone the connection is totally fine.

dpiow commented 2 years ago

For those who use HP laptops - disable "Live QoS NDIS 6 Filter Driver" on all of physical adapters (WiFi, Ethernet etc). This finally helped me to sort out this SSH hangs.

Penbuga commented 2 years ago

We were able to fix the problem by adjusting the MTU size. There was a mismatch between the MTU size of the WSL and that of the VPN client.

tanjakantola commented 1 year ago

I have the same issue. Have tried many fixes suggested in this and other chains. Nothing has worked.

claywd commented 1 year ago

having this same issue when running k3d clusters with docker-desktop that is configured with wsl

ScientificProgrammer commented 10 months ago

For those who use HP laptops - disable "Live QoS NDIS 6 Filter Driver" on all of physical adapters (WiFi, Ethernet etc). This finally helped me to sort out this SSH hangs.

FWIW, for several years now on my work laptop I had been unable to use SSH from any WSL2 terminal to connect to any of my AWS EC2 instances. The problem was that the Linux terminal would freeze at some point after connecting to my remote EC2 instance. Most of the time, the terminal would freeze about 10 seconds after connecting, which was infuriating. I knew that the problem had something to do with the WSL2 because I could connect from git-scm bash with no issues whatsoever.

I'm glad that I found this thread because I didn't suspect that the problem was also related to my HP laptop, which is a ZBook series. After disabling LiveQoS NDIS 6 Filter for both my wireless adapter and the vEthernet (Default Switch) adapter, the problem has not occurred. Being able to use SSH from within the WSL2 is a critical capability. I suggest that Microsoft try to work with HP to fix this problem.

richardurban commented 2 months ago

All my WSL2 outbound ssh connections terminate after some 10min idle time as well.
The same connections from the same Debian 12 flavour in WSL1 stay alive forever.
A related bug has been closed, see https://github.com/microsoft/WSL/issues/8797

[prurban@de12w2 ~]$ /mnt/c/windows/system32/wsl.exe --version
WSL version: 2.0.14.0
Kernel version: 5.15.133.1-1
Windows version: 10.0.19045.4291
[prurban@de12w2 ~]$ wslinfo --networking-mode
nat
[prurban@de12w2 ~]$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:ce:e7:a3 brd ff:ff:ff:ff:ff:ff
    inet 172.31.30.46/20 brd 172.31.31.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::215:5dff:fece:e7a3/64 scope link 
       valid_lft forever preferred_lft forever

ssh -G excerpt

tcpkeepalive no
serveralivecountmax 3
serveraliveinterval 290
connectionattempts 1
serveralivecountmax 3
serveraliveinterval 290
forwardagent no
connecttimeout none

ssh -vvv shows

debug3: receive packet: type 82
debug3: send packet: type 80
debug3: receive packet: type 82
debug3: send packet: type 80
debug3: send packet: type 80
debug3: send packet: type 80
Timeout, server xxx not responding.

tcpdump on the client side shows first packets back and forth, and then packets to the server only. Note their incorrect cksum from the beginning.

14:04:20.471309 IP (tos 0x0, ttl 49, id 34190, offset 0, flags [DF], proto TCP (6), length 80)
    10.64.176.98.ssh > 172.31.30.46.45854: Flags [P.], cksum 0xd762 (correct), seq 1109:1137, ack 1060, win 501, options [nop,nop,TS val 389610089 ecr 3180047163], length 28
E..P..@.1.?*
@.b...........r.9?G.....b.....
.8.i...;...vJP-PrYC..9..j..*..  ]....
14:04:20.471334 IP (tos 0x10, ttl 64, id 40647, offset 0, flags [DF], proto TCP (6), length 52)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [.], cksum 0x8516 (incorrect -> 0x592e), ack 1137, win 502, options [nop,nop,TS val 3180047242 ecr 389610089], length 0
E..4..@.@.......
@.b.....9?G...............
.....8.i
14:09:10.412038 IP (tos 0x10, ttl 64, id 40648, offset 0, flags [DF], proto TCP (6), length 104)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [P.], cksum 0x854a (incorrect -> 0x3312), seq 1060:1112, ack 1137, win 502, options [nop,nop,TS val 3180337182 ecr 389610089], length 52
E..h..@.@.......
@.b.....9?G.........J.....
.....8.iT..6..'........e.>.....I...|jodz..      ..q..N....:^1p...
14:09:10.487498 IP (tos 0x0, ttl 49, id 34191, offset 0, flags [DF], proto TCP (6), length 80)
    10.64.176.98.ssh > 172.31.30.46.45854: Flags [P.], cksum 0xfe8e (correct), seq 1137:1165, ack 1112, win 501, options [nop,nop,TS val 389900108 ecr 3180337182], length 28
E..P..@.1.?)
@.b.............9?{...........
.=gL.....B...v..8.Ob.c.%f.l..A.am.<d
14:09:10.487526 IP (tos 0x10, ttl 64, id 40649, offset 0, flags [DF], proto TCP (6), length 52)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [.], cksum 0x8516 (incorrect -> 0x7f12), ack 1165, win 502, options [nop,nop,TS val 3180337258 ecr 389900108], length 0
E..4..@.@.......
@.b.....9?{...............
...j.=gL
14:14:00.412291 IP (tos 0x10, ttl 64, id 40650, offset 0, flags [DF], proto TCP (6), length 104)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [P.], cksum 0x854a (incorrect -> 0xbc02), seq 1112:1164, ack 1165, win 502, options [nop,nop,TS val 3180627182 ecr 389900108], length 52
E..h..@.@.......
@.b.....9?{.........J.....
.....=gL....sPH.3....Q....5.>vx..?..`c....f&.......S....A..7
14:14:00.712451 IP (tos 0x10, ttl 64, id 40651, offset 0, flags [DF], proto TCP (6), length 104)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [P.], cksum 0x854a (incorrect -> 0xbad5), seq 1112:1164, ack 1165, win 502, options [nop,nop,TS val 3180627483 ecr 389900108], length 52
E..h..@.@.......
@.b.....9?{.........J.....
.....=gL....sPH.3....Q....5.>vx..?..`c....f&.......S....A..7
14:14:01.572400 IP (tos 0x10, ttl 64, id 40653, offset 0, flags [DF], proto TCP (6), length 104)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [P.], cksum 0x854a (incorrect -> 0xb779), seq 1112:1164, ack 1165, win 502, options [nop,nop,TS val 3180628343 ecr 389900108], length 52
E..h..@.@.......
@.b.....9?{.........J.....
...w.=gL....sPH.3....Q....5.>vx..?..`c....f&.......S....A..7
14:14:02.732438 IP (tos 0x10, ttl 64, id 40654, offset 0, flags [DF], proto TCP (6), length 104)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [P.], cksum 0x854a (incorrect -> 0xb2f1), seq 1112:1164, ack 1165, win 502, options [nop,nop,TS val 3180629503 ecr 389900108], length 52
E..h..@.@.......
@.b.....9?{.........J.....
.....=gL....sPH.3....Q....5.>vx..?..`c....f&.......S....A..7
14:14:05.052473 IP (tos 0x10, ttl 64, id 40655, offset 0, flags [DF], proto TCP (6), length 104)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [P.], cksum 0x854a (incorrect -> 0xa9e1), seq 1112:1164, ack 1165, win 502, options [nop,nop,TS val 3180631823 ecr 389900108], length 52
E..h..@.@.......
@.b.....9?{.........J.....
.....=gL....sPH.3....Q....5.>vx..?..`c....f&.......S....A..7
...
14:26:42.892697 IP (tos 0x10, ttl 64, id 40665, offset 0, flags [DF], proto TCP (6), length 104)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [P.], cksum 0x854a (incorrect -> 0x1986), seq 1112:1164, ack 1165, win 502, options [nop,nop,TS val 3181389663 ecr 389900108], length 52
E..h..@.@.......
@.b.....9?{.........J.....
..'_.=gL....sPH.3....Q....5.>vx..?..`c....f&.......S....A..7
14:28:30.402727 IP (tos 0x10, ttl 64, id 40666, offset 0, flags [DF], proto TCP (6), length 156)
    172.31.30.46.45854 > 10.64.176.98.ssh: Flags [FP.], cksum 0x857e (incorrect -> 0x2ac6), seq 1164:1268, ack 1165, win 502, options [nop,nop,TS val 3181497173 ecr 389900108], length 104
E.....@.@.......
@.b.....9?..........~.....
...U.=gLW.^I..D..       ..dL.8....J..q....O.......-.?.+.6..R.O...7+.....%..|.....+..`+...c..2..>I......B8
*........9?.