Open GerardGarcia opened 2 days ago
cc @kevinGC
This is a gVisor dump with --gso=false
The gVisor sandbox appears to hang when the error occurs. When executing curl with a large payload it blocks and is it impossible to execute any other command that uses the network (commands that do not access the network work fine). I attach a dump of the goroutines while blocked.
runsc version release-20241118.0-15-gb15656de596e spec: 1.1.0-rc.1 dlv.log
Description
It appears to be that ACKs are not processed by gVisor netstack when the packet is big enough to be fragmented somewhere down the network stack. This causes the TCP connection to misbehave due to the client retransmitting and the server sending duplicate ACKs. If GSO (
--gso=false
) or the whole gVisor network stack is disabled (--network=host
) the connection works as expected. I attach a few network dumps:At the gVisor sandboxed container veth:
At gVisor (
--pcap-log
)Outside the gVisor sandboxed container veth:
My interpretation is that ACKs at packets 11/12 are not seen by netstack which causes the retransmissions and ACK duplicates.
Steps to reproduce
In our environment is straightforward to replicate, just send a request with a large payload with, for example, curl:
curl -XPOST http://httpbin.org/post -d @req_large.json
If the request is smaller (payload less than 1420B) everything works as expected
runsc version
docker version (if using docker)
uname
Linux (...) 5.15.166-111.163.amzn2.x86_64 #1 SMP Fri Sep 6 21:31:40 UTC 2024 x86_64 GNU/Linux
kubectl (if using Kubernetes)
We are running gVisor sandboxes within a pod not using gVisor sandboxes to wrap k8s pods
repo state (if built from source)
No response
runsc debug logs (if available)