Open nnathan opened 3 months ago
Failed to parse logs. Unexpected file: eth1.2024-07-21.pcap The log file doesn't contain any WSL traces. Please make sure that you reproduced the issue while the log collection was running.
Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!
Note: You can give me feedback by thumbs upping or thumbs downing this comment.
Multiple log files found, using: https://github.com/user-attachments/files/16324291/pcaps.zip
Found no WSL traces in the logs
I've tried again, maybe the bot doesn't like the order of the attachments:
WslNetworkingLogs-2024-07-21_22-30-01.zip WslLogs-2024-07-21_22-30-14.zip wget-strace.txt pcaps.zip
I've done another attempt but this time took captures of all interfaces affected including wifi adapter on the Windows machine:
WslNetworkingLogs-2024-07-21_23-04-48.zip WslLogs-2024-07-21_23-04-51.zip pcaps.zip wget-strace.txt
It appears the problem isn't specific to Docker but can be reproduced just using a bridge interface and network namespaces.
sudo systemctl stop docker.socket
sudo service docker stop
sudo ip link add name test0 type bridge
sudo ip addr add 172.18.0.1/24 dev test0
sudo ip link set test0 up
echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward
sudo iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE
sudo ip link add veth0 type veth peer name veth1
sudo ip link set veth0 master test0
sudo ip link set veth0 up
sudo ip netns add ns1
sudo ip link set veth1 netns ns1
sudo ip netns exec ns1 ip addr add 172.18.0.2/24 dev veth1
sudo ip netns exec ns1 ip link set veth1 up
sudo ip netns exec ns1 ip route add default via 172.18.0.1
This results in:
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 04:42:1a:2d:71:2e brd ff:ff:ff:ff:ff:ff
3: loopback0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:15:5d:5f:95:b8 brd ff:ff:ff:ff:ff:ff
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 14:85:7f:0a:d4:49 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.161/24 brd 192.168.0.255 scope global noprefixroute eth1
valid_lft forever preferred_lft forever
inet6 fe80::6608:3a69:3d9a:bb58/64 scope link nodad noprefixroute
valid_lft forever preferred_lft forever
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:04:e0:2a:46 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
28: test0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 7e:ef:c9:da:2f:bd brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/24 scope global test0
valid_lft forever preferred_lft forever
30: veth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master test0 state UP group default qlen 1000
link/ether b2:d4:94:e2:a1:b5 brd ff:ff:ff:ff:ff:ff link-netns ns1
$ brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.024204e02a46 no
test0 8000.7eefc9da2fbd no veth0
$ while true; do sudo ip netns exec ns1 wget -O/dev/null --max-redirect 0 http://angry.lastninja.net/test.dat; done
WslNetworkingLogs-2024-07-22_04-02-24.zip WslLogs-2024-07-22_04-02-27.zip bridgetest-pcaps.zip
These look similar to the docker ones. I noticed when capturing on Ubuntu 24.04 host interface eth1 and Windows host Wi-Fi interface that the following packet leaked which is triggered when I Ctrl-c (SIGINT) wget after it stalls:
3469 45.396933 172.18.0.2 149.248.1.168 TCP 66 46762 → 80 [FIN, ACK] Seq=1 Ack=1 Win=24568 Len=0 TSval=2410143952 TSecr=1912227884
This is showing the bridged IP address on the Internet facing interfaces, what's probably happened is after the connection stalled the connection tracking entry was removed and then SIGINT was sent to wget which triggered a connction hangup, but because there's no flow to match against in connection tracking state it probably just forwarded the packet instead (just a guess). This is unlikely to be helpful in investigation the actual issue of connection stalling though. They pretty much show the same issue as Docker containers.
Failed to parse logs. Unexpected file: veth0.pcap The log file doesn't contain any WSL traces. Please make sure that you reproduced the issue while the log collection was running.
Failed to parse logs. Unexpected file: eth1.2024-07-21.pcap The log file doesn't contain any WSL traces. Please make sure that you reproduced the issue while the log collection was running.
Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!
Note: You can give me feedback by thumbs upping or thumbs downing this comment.
Multiple log files found, using: https://github.com/user-attachments/files/16324291/pcaps.zip
Found no WSL traces in the logs
Failed to parse logs. Unexpected file: vethcdca12b.docker0.2024-07-21.pcap The log file doesn't contain any WSL traces. Please make sure that you reproduced the issue while the log collection was running.
Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it!
Note: You can give me feedback by thumbs upping or thumbs downing this comment.
Multiple log files found, using: https://github.com/user-attachments/files/16324291/pcaps.zip
Found no WSL traces in the logs
Issue persists even if I disable the firewall on the Windows Host (Windows Defender Firewall). (Which actually makes sense because this is not an issue with the WSL2 Ubuntu host - just network namespaces/docker containers connected via a bridge).
To appease the bot:
WslNetworkingLogs-2024-07-22_04-02-24.zip WslLogs-2024-07-22_04-02-27.zip
@nnathan does the network stalling happens if you run the same test outside of a container? I tried using a ubuntu-20.04
docker container and didn't observed issues. The test was stopped after 100 iterations.
WSL networking was mirrored mode. Didn't try NAT.
@nnathan does the network stalling happens if you run the same test outside of a container? I tried using a
ubuntu-20.04
docker container and didn't observed issues. The test was stopped after 100 iterations.
Sorry I failed to clarify this.
In mirrored mode:
wget
.veth
adapter and NATing out via eth1.test output WSL networking was mirrored mode. Didn't try NAT.
Thanks for testing things out. In your case you're NATing since this is how Docker works by default.
This problem persists on 6.6.36.3-microsoft-standard-WSL2
kernel. Would really appreciate some help on this as this is very frustrating as Docker is not usable in mirrored mode.
Windows Version
Microsoft Windows [Version 10.0.22631.3880]
WSL Version
2.0.14.0 & 2.2.4.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.133.1-1 & 5.15.153.1-microsoft-standard-WSL2
Distro Version
Ubuntu 24.04
Other Software
Repro Steps
Install wget:
Then do download tests:
At some point the connection will stall with a progress like this, note the
--.-KB/s
which indicates the connection has stalled:Expected Behavior
It should download without issue, here's the same wget on the wsl2 host (Ubuntu 24.04):
Actual Behavior
This is the wget stalling in the container:
Strace output of
strace -s4 wget --quiet -O/dev/null --max-redirect 0 http://angry.lastninja.net/test.dat 2>&1 | tee /tmp/wget-strace.txt
this:In the
eth1
pcap after the http download stalls there's a TCP reset sent from the client to the server:In the capture of the docker container interface
vethcdca12b.docker0
after the connection stalls I see:The destination unreachable packet (25584) looks like this:
The Ack packet (25583) that is sent by the client just prior to receiving that connection reset has the following in the TCP headers:
The discrepency I see is that the raw acknowledgement number is correct, while there seems to be a discrepency with the relative acknowledgement number. I'm not sure if this is the root cause, and I'm not sure why an ICMP message is even being sent by the docker gateway IP
172.17.0.1
.Diagnostic Logs
pcaps.zip WslNetworkingLogs-2024-07-21_21-56-30.zip wget-strace.txt