Open dombegin opened 4 years ago
I have "solved" this issue: Removed WSL + all Hyper-V stuff and moved over to VirtualBox with a dedicated Linux VM.
I have 'solved' this problem by simply using the power shell with ssh and scp ... I really wonder why Microsoft is not able to FIX this problem!
I've been playing with this issue over the past couple weeks. Here is what I've adjusted (and automated for new VMs), done to cope, and what else I've observed:
For me, I have not experienced trying to connect and not being able to do anything at all with SSH. I suspect that issue and how it is manifesting for me are different underlying issues. As pointed out by @andreasmarkussen , this thread is probably capturing multiple different underlying issues from different people. It also captures them over time, as WSL and the OSes involved have shifted. For instance, the MTU aspect seems to have zero implications for me, and I can't help but think that was applicable for the earlier manifestations only and Microsoft has since fixed that aspect - or maybe I'm just lucky.
I've always been a fan of using screen for remote connections so that I could detatch and disconnect, and later reconnect and re-attach to the session, kept running int he background until I got back. Generally, I'd not bothered to use it when first setting up VMs, but I began to become more aggressive in its use with this issue, using screen -Rad
upon login in order to have persistent sessions and not lose work when my connection got borked.
If you've not used screen or similar terminal multiplexera, I highly recommend using this for remote connections in general.
Below is an automation function (written with bash in mind) being used with cloud-init
It can also be used for establish systems for a quick change (just change the backup extension since it isn't being run by cloud-init in that instance) The main things being changed focus on maintaining a connection. Since using this on my VMs, I've now been able to keep SSH connections alive overnight, I am confident that the settings made in the below code made a tremendous difference. I've not tested to see which setting was ultimately responsible, or if indeed it is the entire combination of TCPKeepAlive yes
, ClientAliveInterval 30
, and ClientAliveCountMax 10000
that ultimately made the difference. Note I also change the default port, and force use of SSH keys by disabling password authentication - take those lines out if you don't want to do that.
_Note, in terms of sequence it is important to take care when you run this while launching a new VM. There are two primary considerations: 1) not getting locked out of your system, 2) making a mess of things during system updates.
For the first concern, I've tested it multiple times, and know that it works reliably for me now. I can either add port 22 to my security group temporarily during startup, or just use the new port assignment after I've given the VM enough time to do its thing. For the second, I've noticed that upgrade often impacts /etc/ssh/sshd_config
, so I've found it easier to just make sure I've done the update/upgrade first. For whatever reason, before the upgrade I've found the file to be empty as well. I actually like having the other commented out defaults for reference when looking at the file, so doing the upgrade first is useful in that regard as well._
configure_ssh() {
echo "Changing SSH port to ${SSH_PORT}, as well as a few connectivity settings..."
declare -A ssh_settings
ssh_settings[Port]="${SSH_PORT}"
ssh_settings[TCPKeepAlive]="yes"
ssh_settings[ClientAliveInterval]="30"
ssh_settings[ClientAliveCountMax]="10000"
ssh_settings[PasswordAuthentication]="no"
ssh_settings[ChallengeResponseAuthentication]="no"
SSHD_CONFIG=/etc/ssh/sshd_config
SED_EXP=""
for setting in "${!ssh_settings[@]}"; do
if grep -e "^#*${setting} " ${SSHD_CONFIG}; then
echo "Changed --> ${ssh_settings[${setting}]}"
SWAP_TEXT="s/^#*${setting}.*/${setting} ${ssh_settings[${setting}]}/;"
SED_EXP="${SED_EXP} ${SWAP_TEXT}"
else
ADD_TEXT=\$"s/"\$"/\n${setting} ${ssh_settings[${setting}]}/;"
SED_EXP="${SED_EXP} ${ADD_TEXT}"
fi
done
SFX=$(date +%Y%h%d_%H.%M)
sed -i.cloud-init.bak.${SFX^^} -e "${SED_EXP}" /etc/ssh/sshd_config
# Restarting SSH immediately since we'll want to connect to it in short order for monitoring...
systemctl restart ssh
}
I had similar issues with VPN traffic, have you tried enabling tcp_mtu_probing
in WSL2? I set it to 1
, fixed it for me.
I had similar issues with VPN traffic, have you tried enabling
tcp_mtu_probing
in WSL2? I set it to1
, fixed it for me.
Thanks for the suggesion, did nothing here.
So I had also problems with my ssh connections on my desktop PC for a long time now. For me, the MTU did not help at all, but now it seems that the problem for me was being connected through Wi-Fi and Ethernet at the same time. Since I switched off my Wi-Fi antenna, the connection freezes and loop breakdowns seem to have stopped (for at least about half an hour now...). Maybe this can help someone as well (or help Microsoft fixing these problems...).
Sharing my anecdata: in a WSL2 client, when I crank up the SSH client verbosity, this is what I get:
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
at which point it hangs until eventually I get Connection closed by $REMOTE
.
Server logs (seen via journalctl -u ssh
) show:
$timestamp $hostname sshd[11151]: fatal: Timeout before authentication for $client_ip port $client_port
which suggests to me that packets in the key exchange are getting dropped somewhere in the response path in a non-deterministic fashion. The fact that some folks can get around this by twiddling random network settings about packet size I think corroborates this; if the non-determinism was intrinsically the result of, say, some kind of byte truncation for large packets, then I could see that happening.
Debian 11 (bullseye) in WSL2 shows the ssh hanging as well.
Time to time this occurs to me too. I had changed MTU to 1350 and that had fixed the issue. Now when it occurs I just restart my laptop unfortunately.
Time for yet another bump:
wsl -v WSL version: 1.2.5.0 Kernel version: 5.15.90.1 WSLg version: 1.0.51 MSRDC version: 1.2.3770 Direct3D version: 1.608.2-61064218 DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp Windows version: 10.0.22000.2057
$ uname -a Linux blahh 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)"
SSH fails after random periods. Even with continual traffic, keepalives etc.
Same issue: SSH does not connect at all (while putty or ssh.exe in terminal does work). Windows 11 (version 10.0.22621) and WSL2 with Ubuntu 22.04. None of the suggested fixes worked.
WSL2 on Debian 12 (bookworm) i don't see any problems, unlike Ubuntu 22.04
In WSL1, everything works smoothly and have no issues. It's only in WSL2 that this happens.
In my experience, WSL2 is 💩.
In case anyone has in the same issue. I do not find the logic of the issue. I tried all options and none worked. My setup is:
As one solution I read that you have to match the MTU of your VPN adapter and WSL eht0 adpater. That did not work for me. Honestly I just started testing random MTU number until it worked with 700. You can try several values and maybe it helps you sudo ip link set dev eth0 mtu 700
Current Version 10.0.19025.1
I have this weird issue where I can no longer use SSH connections to remote servers from WSL2. I remember that it was working ok in early builds but I am not sure at which point it started to fail.
Any idea on how to resolve this would be appreciated.
What happens
When connected to a remote server, SSH connection hangs after a very short time. I am sometimes able to type a few letters but then it hangs and have to close WSL. It does this with every SSH connections to every server.
For instance, on the following screenshot, you can see that I was able to type a few numbers but the connection froze at the last "1".
Probably related, SSH git cloning is not working either. It starts receiving objects but stops shortly after. I have to CTRL-C to stop. Here's an example hanging at 46%.
> GIT_SSH_COMMAND="ssh -vvv" git clone --verbose git@github.com:microsoft/dotnet.git
...![image](https://user-images.githubusercontent.com/967871/69064429-31c2c000-09ec-11ea-89a5-614bc02078b0.png)
In WSL1, everything works smoothly and have no issues. It's only in WSL2 that this happens.
Note that git cloning through HTTPS works fine as well.
Just let me know if there is additional trace I can run to help since I know this is probably going to be hard to repro.