microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.24k stars 811 forks source link

WSL network crash under constant load #10817

Open sec opened 9 months ago

sec commented 9 months ago

Windows Version

Microsoft Windows [Version 10.0.19045.3693]

WSL Version

2.0.11.0

Are you using WSL 1 or WSL 2?

Kernel Version

5.15.133.1-1

Distro Version

Ubuntu-20.04

Other Software

Docker version 23.0.2, build 569dd73 (run inside WSL)

Repro Steps

I've created sample repo with repro steps and needed software - https://github.com/sec/wsl-network-crash-test In short

  1. Run some service under WSL/Docker
  2. Access that service from Windows host in a loop
  3. Wait - sometime it takes 1 minute to fail, sometimes it's 10, but it will crash sooner or later
  4. Connection from Windows host to WSL/Docker will stop working
  5. After it will stop working, even normal app run under pure WSL can't be accessed from Windows host

Doing wsl --shutdown and launching WSL/Docker again fix the issue, but that's not the solution to take.

I'm having this issue for many versions back, I've tried to downgrade to almost all possible ones 2.x and the problem is inside all of them. IIRC under 1.x this was working fine (I have big project that I work on that's running containers inside Docker which are accessed from Windows host).

Logs attached, started with everything working, then run the repro steps, it crashed the network, logs collected - hope there's something inside that will help fix this.

Expected Behavior

Network connection from Windows to WSL should work.

Actual Behavior

WSL network cannot be accessed from Windows host.

Diagnostic Logs

WslLogs-2023-11-23_10-48-17.zip WslNetworkingLogs-2023-11-23_11-17-11.zip

github-actions[bot] commented 9 months ago

Hi I'm an AI powered bot that finds similar issues based off the issue title.

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

pocesar commented 9 months ago

happening here too, but it's only the networking that breaks, you can still run commands using wsl --exec ps -ax for example and see all processes

sec commented 9 months ago

happening here too, but it's only the networking that breaks, you can still run commands using wsl --exec ps -ax for example and see all processes

Exactly, after networking broke, everything else looks working, but to get the network back, need to shutdown WSL and re-launch.

keith-horton commented 9 months ago

How are you accessing the WSL container from the Windows Host? Is it through the Docker bridge? From the WSL-side, I didn't find any issues.

Docker has a new release you may try. https://docs.docker.com/desktop/release-notes/#4260

sec commented 9 months ago

I don't think it's related to docker. The same thing happen with podman (with it's machine running inside WSL). When WSL network crash, podman can't connect it it's machine also. Same with using podman under WSL - the same repro steps can be taken to crash WSL network.

WSL is using default networking settings, I didn't change anything.

Have you tried to reproduce the error using repo and steps I've made? That's shouldn't take more than few minutes to show the problem.

pocesar commented 9 months ago

just happened again, but this time it was completely unresponsive to wsl --shutdown (using 2.0.14). vmcompute was using 100% memory and 100% CPU (of 6 cores)

image

can't even stop the service

KILLME56k commented 9 months ago

My network fails when i open explorer.

ademyankov commented 7 months ago

wsl2 constantly crashes for me too when under heavy load. But I am not even sure that it can be called heavy?!

I have an SDK that supports multiple platforms, so I run a script that creates build directory for all those (about 10) different platforms and run cmake in the background to configure all of them.

Something like so:

cd build/x86_release && cmake ../.. &
cd build/x86_debug && cmake -DCMAKE_BUILD_TYPE=Debug ../.. &
cd build/rpi4_release && cmake ../.. &
cd build/rpi4_debug && cmake -DCMAKE_BUILD_TYPE=Debug ../.. &
cd build/rpi5_release && cmake ../.. &
cd build/rpi5_debug && cmake -DCMAKE_BUILD_TYPE=Debug ../.. &
cd build/esp32s3_release && cmake ../.. &
cd build/esp32s3_debug && cmake -DCMAKE_BUILD_TYPE=Debug ../.. &

Never got a single successfull run, ever! It crashes all the time and closes all open wsl windows.

And sometimes I cannot restart it, I have to do this first:

c:\>wsl --shutdown

It is extrimely annoying and disappointing!

WSL version: 2.0.9.0
Kernel version: 5.15.133.1-1
WSLg version: 1.0.59
MSRDC version: 1.2.4677
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22631.3007
sec commented 6 months ago

Any update on this? This is making WSL useless for any real usage. Just checked newest 2.0.14.0 and it still crashed network under any real load. Does not matter if I use docker or podman as distro - this is WSL core issue.

keith-horton commented 5 months ago

@OneBlue , it looks like the wslrelay is timing out trying to talk to the container. (lots of WSAETIMEDOUT errors on the relay sockets) + we can see an HvSocketConnectionDisconnected event that precedes it. Are there any known hvsocket issues?

hyan23 commented 3 months ago

I have encountered a similar problem, and I feel that this problem is related to Ipv6(because localhost resolves to ::1). If I do not use localhost, I can access normally by using 127.0.0.1.

keith-horton commented 3 months ago

Right, accessing the container from the host is supported only through 127.0.0.1 in Mirrored Mode (there Linux option we use to enabling routing loopback traffic only exists for IPv4, not IPv6, unfortunately).

Does crashing under load only happen when in NAT Mode? or in Mirrored Mode? NAT mode uses a relay that moves traffic over an hvSocket (see the HvSocketConnectionDisconnected event reference above). Mirrored Mode does not need a relay: it's routed through the vswitch connecting the container.

sec commented 3 months ago

Mirrored mode is not supported when I try to enable it, it switch back to using NAT - where can I find requirments for this mode or check why it's not supported?

keith-horton commented 3 months ago

Hi there. Mirrored Mode is supported on Windows 11 22H2 or later. https://learn.microsoft.com/en-us/windows/wsl/wsl-config

sec commented 3 months ago

As I wrote, I'm on Windows 10 and can't use that mode. Can't this be fixed, as it was working fine before some versions of WSL and started to break in recent (now almost a year) versions.