microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
16.93k stars 799 forks source link

SSH connections hanging from WSL2 #4690

Open dombegin opened 4 years ago

dombegin commented 4 years ago

Current Version 10.0.19025.1

I have this weird issue where I can no longer use SSH connections to remote servers from WSL2. I remember that it was working ok in early builds but I am not sure at which point it started to fail.

Any idea on how to resolve this would be appreciated.

What happens

When connected to a remote server, SSH connection hangs after a very short time. I am sometimes able to type a few letters but then it hangs and have to close WSL. It does this with every SSH connections to every server.

For instance, on the following screenshot, you can see that I was able to type a few numbers but the connection froze at the last "1".

image

Probably related, SSH git cloning is not working either. It starts receiving objects but stops shortly after. I have to CTRL-C to stop. Here's an example hanging at 46%.

> GIT_SSH_COMMAND="ssh -vvv" git clone --verbose git@github.com:microsoft/dotnet.git

... image

In WSL1, everything works smoothly and have no issues. It's only in WSL2 that this happens.

Note that git cloning through HTTPS works fine as well.

Just let me know if there is additional trace I can run to help since I know this is probably going to be hard to repro.

derritter88 commented 2 years ago

I have "solved" this issue: Removed WSL + all Hyper-V stuff and moved over to VirtualBox with a dedicated Linux VM.

stela2502 commented 2 years ago

I have 'solved' this problem by simply using the power shell with ssh and scp ... I really wonder why Microsoft is not able to FIX this problem!

trailstrider commented 2 years ago

I've been playing with this issue over the past couple weeks. Here is what I've adjusted (and automated for new VMs), done to cope, and what else I've observed:

Manifestation conditions

For me, I have not experienced trying to connect and not being able to do anything at all with SSH. I suspect that issue and how it is manifesting for me are different underlying issues. As pointed out by @andreasmarkussen , this thread is probably capturing multiple different underlying issues from different people. It also captures them over time, as WSL and the OSes involved have shifted. For instance, the MTU aspect seems to have zero implications for me, and I can't help but think that was applicable for the earlier manifestations only and Microsoft has since fixed that aspect - or maybe I'm just lucky.

  1. I am using Windows 11 Enterprise ( Build 22000) with WSL2 (Kernel version: 5.10.102.1), Ubuntu 20.04.4 LTS
  2. The latest Docker Desktop (4.8.2 (79419)) is running in the background, but not running any containers.
  3. Connections to Linux VMs in the cloud (Azure) are what I'm primarily seeing affected. [NOTE: AWS connections get random multi-second lags, but I don't lose connections the same way - though sometimes they get forcibly disconnected by the remote - not the same behavior though, as that is explicitly stated on disconnect.]
  4. I'd be in the middle of typing into the terminal (bash or vim), and the connection would freeze - and never come back.
  5. Mostly in the terminal from VS Code (v 1.67.2 as of this writing, and using the stated WSL2 Ubuntu stated above), but would also happen in Windows Terminal. I don't know why the VS Code terminal instances are affected more, or perhaps it just seems that way because of how I work?

Coping with disconnections

I've always been a fan of using screen for remote connections so that I could detatch and disconnect, and later reconnect and re-attach to the session, kept running int he background until I got back. Generally, I'd not bothered to use it when first setting up VMs, but I began to become more aggressive in its use with this issue, using screen -Rad upon login in order to have persistent sessions and not lose work when my connection got borked.

If you've not used screen or similar terminal multiplexera, I highly recommend using this for remote connections in general.

SSH Configuration

Below is an automation function (written with bash in mind) being used with cloud-init It can also be used for establish systems for a quick change (just change the backup extension since it isn't being run by cloud-init in that instance) The main things being changed focus on maintaining a connection. Since using this on my VMs, I've now been able to keep SSH connections alive overnight, I am confident that the settings made in the below code made a tremendous difference. I've not tested to see which setting was ultimately responsible, or if indeed it is the entire combination of TCPKeepAlive yes, ClientAliveInterval 30, and ClientAliveCountMax 10000 that ultimately made the difference. Note I also change the default port, and force use of SSH keys by disabling password authentication - take those lines out if you don't want to do that.

_Note, in terms of sequence it is important to take care when you run this while launching a new VM. There are two primary considerations: 1) not getting locked out of your system, 2) making a mess of things during system updates. For the first concern, I've tested it multiple times, and know that it works reliably for me now. I can either add port 22 to my security group temporarily during startup, or just use the new port assignment after I've given the VM enough time to do its thing. For the second, I've noticed that upgrade often impacts /etc/ssh/sshd_config, so I've found it easier to just make sure I've done the update/upgrade first. For whatever reason, before the upgrade I've found the file to be empty as well. I actually like having the other commented out defaults for reference when looking at the file, so doing the upgrade first is useful in that regard as well._

configure_ssh() {
    echo "Changing SSH port to ${SSH_PORT}, as well as a few connectivity settings..."

    declare -A ssh_settings
    ssh_settings[Port]="${SSH_PORT}"
    ssh_settings[TCPKeepAlive]="yes"
    ssh_settings[ClientAliveInterval]="30"
    ssh_settings[ClientAliveCountMax]="10000"
    ssh_settings[PasswordAuthentication]="no"
    ssh_settings[ChallengeResponseAuthentication]="no"

    SSHD_CONFIG=/etc/ssh/sshd_config
    SED_EXP=""
    for setting in "${!ssh_settings[@]}"; do
        if grep -e "^#*${setting} " ${SSHD_CONFIG}; then
            echo "Changed --> ${ssh_settings[${setting}]}"
            SWAP_TEXT="s/^#*${setting}.*/${setting} ${ssh_settings[${setting}]}/;"
            SED_EXP="${SED_EXP} ${SWAP_TEXT}"
        else
            ADD_TEXT=\$"s/"\$"/\n${setting} ${ssh_settings[${setting}]}/;"
            SED_EXP="${SED_EXP} ${ADD_TEXT}"
        fi
    done

    SFX=$(date +%Y%h%d_%H.%M)
    sed -i.cloud-init.bak.${SFX^^} -e "${SED_EXP}" /etc/ssh/sshd_config 

    # Restarting SSH immediately since we'll want to connect to it in short order for monitoring...
    systemctl restart ssh
}
facboy commented 2 years ago

I had similar issues with VPN traffic, have you tried enabling tcp_mtu_probing in WSL2? I set it to 1, fixed it for me.

ktpx commented 2 years ago

I had similar issues with VPN traffic, have you tried enabling tcp_mtu_probing in WSL2? I set it to 1, fixed it for me.

Thanks for the suggesion, did nothing here.

Deadmansshoe commented 1 year ago

So I had also problems with my ssh connections on my desktop PC for a long time now. For me, the MTU did not help at all, but now it seems that the problem for me was being connected through Wi-Fi and Ethernet at the same time. Since I switched off my Wi-Fi antenna, the connection freezes and loop breakdowns seem to have stopped (for at least about half an hour now...). Maybe this can help someone as well (or help Microsoft fixing these problems...).

sxlijin commented 1 year ago

Sharing my anecdata: in a WSL2 client, when I crank up the SSH client verbosity, this is what I get:

debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

at which point it hangs until eventually I get Connection closed by $REMOTE.

Server logs (seen via journalctl -u ssh) show:

$timestamp $hostname sshd[11151]: fatal: Timeout before authentication for $client_ip port $client_port

which suggests to me that packets in the key exchange are getting dropped somewhere in the response path in a non-deterministic fashion. The fact that some folks can get around this by twiddling random network settings about packet size I think corroborates this; if the non-determinism was intrinsically the result of, say, some kind of byte truncation for large packets, then I could see that happening.

benjaesq commented 1 year ago

Debian 11 (bullseye) in WSL2 shows the ssh hanging as well.

rodonal commented 1 year ago

Time to time this occurs to me too. I had changed MTU to 1350 and that had fixed the issue. Now when it occurs I just restart my laptop unfortunately.

fysmd commented 1 year ago

Time for yet another bump:

wsl -v WSL version: 1.2.5.0 Kernel version: 5.15.90.1 WSLg version: 1.0.51 MSRDC version: 1.2.3770 Direct3D version: 1.608.2-61064218 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.22000.2057

$ uname -a Linux blahh 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)"

SSH fails after random periods. Even with continual traffic, keepalives etc.

itakatz commented 9 months ago

Same issue: SSH does not connect at all (while putty or ssh.exe in terminal does work). Windows 11 (version 10.0.22621) and WSL2 with Ubuntu 22.04. None of the suggested fixes worked.

grapheon commented 6 months ago

WSL2 on Debian 12 (bookworm) i don't see any problems, unlike Ubuntu 22.04

pedrohgmacedo commented 2 weeks ago

In WSL1, everything works smoothly and have no issues. It's only in WSL2 that this happens.

In my experience, WSL2 is 💩.

FPintoCircontrol commented 2 weeks ago

MPU of the eth0 adapter is the cause

In case anyone has in the same issue. I do not find the logic of the issue. I tried all options and none worked. My setup is:

As one solution I read that you have to match the MTU of your VPN adapter and WSL eht0 adpater. That did not work for me. Honestly I just started testing random MTU number until it worked with 700. You can try several values and maybe it helps you sudo ip link set dev eth0 mtu 700