microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
16.9k stars 798 forks source link

Systemd breaks mirrored networking #11672

Open withinboredom opened 3 weeks ago

withinboredom commented 3 weeks ago

Windows Version

Microsoft Windows [Version 10.0.22631.3672]

WSL Version

WSL version: 2.1.5.0

Are you using WSL 1 or WSL 2?

Kernel Version

5.15.146.1-2

Distro Version

Ubuntu 24.04

Other Software

curl 8.5.0 (x86_64-pc-linux-gnu) libcurl/8.5.0 OpenSSL/3.0.13 zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.7 libpsl/0.21.2 (+libidn2/2.3.7) libssh/0.10.6/openssl/zlib nghttp2/1.59.0 librtmp/2.3 OpenLDAP/2.6.7
Release-Date: 2023-12-06, security patched: 8.5.0-2ubuntu10.1
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd

Repro Steps

I've tried nearly everything to get mirrored networking mode working again, but for some reason it has stopped working correctly in the last week.

At first, it was similar to other reported issues where mirrored mode would work for about 10-15 minutes and then mysteriously fail. Eventually, it just stopped working altogether. At least that is what I thought. (#11369)

I am still able to ping and I see responses. However, UDP and TCP packets leave the interface, but I never see them return in WSL (though I see their responses and retransmissions in wireshark on the windows side).

I then went on an adventure to uninstall/reinstall network adapters, WSL, etc. None of these things seemed to resolve my issue. It wasn't until I stumbled upon #10842 that I got a crazy idea. My simple idea was to manually set the source port of curl and then use the iperf trick to see if that was a related issue.

To my surprise, this worked exactly once: curl -v google.com --local-port 12345 producing the expected output! When I ran it again, I got: curl: (45) bind failed with errno 98: Address already in use which is weird because there is no longer any process listening on that port. Changing the source port does, in fact, cause it to work exactly once, yet again.

This leads me to believe that this might be a kernel issue, or some other software doing something weird. So, I go to disable systemd ... and lo-and-behold, things work again!

I do note that specifying the source port via curl still only works exactly once and I don't see it in ss output, which is a bit unusual.

I'm kinda stumped at the moment with what systemd might be doing, so any tips would be very much appreciated.

Note that #11143 appears to potentially be a duplicate.

Expected Behavior

Networking to work.

Actual Behavior

Networking does not work.

Diagnostic Logs

Steps followed:

  1. wsl --shutdown
  2. collect logs with .\collect-wsl-logs.ps1
  3. Start up WSL
  4. run curl -v 1.1.1.1 (DNS works via tunneling but lets remove as many variables as possible)
  5. run curl -v 1.1.1.1 --local-port 12345
  6. run curl -v 1.1.1.1 --local-port 12345
  7. stop collecting logs
  8. collect dmesg logs
  9. collect journctl logs (if applicable)
  10. add logs from 8 & 9 to log zip file

WSL startup with systemd: WslLogs-2024-06-09_12-20-00 (2).zip

WSL startup without systemd: WslLogs-2024-06-09_12-27-08 (2).zip

github-actions[bot] commented 3 weeks ago

View similar issues

Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!

Open similar issues:

Closed similar issues:

Note: You can give me feedback by thumbs upping or thumbs downing this comment.

Diagnostic information
Multiple log files found, using: https://github.com/user-attachments/files/15751492/WslLogs-2024-06-09_12-20-00.2.zip
appxpackage.txt not found
optional-components.txt not found
Error while parsing the logs. See action page for details

withinboredom commented 3 weeks ago

It's also worth pointing out that cloud-config and snapd were disabled for those logs (to save anyone else any trouble-shooting). Enabling/disabling them doesn't seem to have any effect.

chanpreetdhanjal commented 2 weeks ago

Hi. Can you please collect networking logs by following the instructions below? https://github.com/microsoft/WSL/blob/master/CONTRIBUTING.md#collect-wsl-logs-for-networking-issues

withinboredom commented 2 weeks ago

Here's with all of networking working correctly: WslNetworkingLogs-2024-06-13_21-10-41.zip

and with systemd preventing networking from working: WslNetworkingLogs-2024-06-13_21-13-10.zip

I performed the same steps as before.

github-actions[bot] commented 2 weeks ago
Diagnostic information ``` Multiple log files found, using: https://github.com/user-attachments/files/15827620/WslNetworkingLogs-2024-06-13_21-10-41.zip .wslconfig found Detected appx version: 2.1.5.0 optional-components.txt not found ```
dcasota commented 1 week ago

With the new networkingMode=mirrored I had similar issues in wsl 2.1.5 and 2.2.4, hence I left it to nat. This works flawlessly.

VMware Photon OS uses systemd as well and it works in wsl by configuring a rootless user same as the logged-in windows user. See https://github.com/dcasota/photonos-scripts/wiki/Photon-OS-on-WSL2, step 4.

withinboredom commented 2 days ago

Bump. Anything?