private-octopus / picoquic

Minimal implementation of the QUIC protocol
MIT License
544 stars 161 forks source link

Reports of abrupt disconnection under packet loss #1581

Closed huitema closed 10 months ago

huitema commented 10 months ago

The reported behavior happens when trying applications under challenging conditions, with high rate of packet loss. In theory, according to RACK, the stack should retry a packet a number of time, using exponential backoff. But in practice the connection just breaks randomly.

huitema commented 10 months ago

Tried a "heavy loss" test, with loss rate set to 50% for 30 seconds. The connection completed.

TimEvens commented 10 months ago

The problem we see is reproducible with an air gap of loss, 100% packet loss (TX) for a short period of time. We see this with WiFi when moving through areas of seriously degraded signal. Often the loss is in one direct, but the effect is the same with a disconnect due to retransmits. We have noticed that the number of streams and amount of data being transmitted per second affects how quickly picoquic will abandon the path (connection).

Reproducing scenarios are platform/network specific. One way to reproduce over all platforms is to start an echo server and connect a client that sends data over one or more streams continuously. Then kill the client where it cannot send any packets anymore. This can be a sigkill/sigterm or it can be done using network filters to block the UDP transmitting packets.

Test setup: Our test has idle timeout set to 30 seconds with keepalive set to 3 seconds. We are using loopback networking with zero latency and BW is not a factor. We are sending 1 packet every 1 millisecond per stream. We do this for 3 streams, so it becomes 3000 packets per second.

The above test results in picoquic abandoning the path due to retransmits in less than 4 seconds. See below log, timestamps are in sync between client and server since they are on the same machine.

CLIENT

2023-11-22T11:05:09.592262 [INFO] [QUIC] Connection established to server 127.0.0.1
2023-11-22T11:05:09.592344 [INFO] [CLIENT] Connection state change context: 5813421568, 0
2023-11-22T11:05:10.266095 <KILLED PROCESS> -- stopped all packets from client

SERVER


2023-11-22T11:05:09.590238 [INFO] [QUIC] New Connection 127.0.0.1 port: 53762 context_id: 4873816576
2023-11-22T11:05:09.590335 [INFO] [SERVER] New connection cid: 4873816576 from 127.0.0.1:53762

... data is being received and echoed back to client

2023-11-22T11:05:10.556454 [DEBUG] [QUIC] remote: 127.0.0.1 port: 53762 context_id: 4873816576 retransmits increased, delta: 1 total: 1 2023-11-22T11:05:11.557023 [DEBUG] [QUIC] remote: 127.0.0.1 port: 53762 context_id: 4873816576 retransmits increased, delta: 14 total: 15 2023-11-22T11:05:12.560368 [DEBUG] [QUIC] remote: 127.0.0.1 port: 53762 context_id: 4873816576 retransmits increased, delta: 14 total: 29 2023-11-22T11:05:13.560375 [DEBUG] [QUIC] remote: 127.0.0.1 port: 53762 context_id: 4873816576 retransmits increased, delta: 15 total: 44 2023-11-22T11:05:13.779095 [INFO] [QUIC] [PQIC] icoquic/loss_recovery.c:497 [picoquic_retransmit_needed_packet]: Too many data retransmits, abandon path

2023-11-22T11:05:13.779385 [INFO] [QUIC] [PQIC] icoquic/loss_recovery.c:520 [picoquic_retransmit_needed_packet]: Too many retransmits of packet number 4894, disconnect 2023-11-22T11:05:13.779508 [INFO] [QUIC] Closing connection stream_id: 0 2023-11-22T11:05:13.779620 [INFO] [QUIC] Delete stream context for stream 0 2023-11-22T11:05:13.779814 [INFO] [SERVER] Connection state change context: 4873816576, 3 2023-11-22T11:05:13.779880 [INFO] [QUIC] [PQIC] oquic/picoquic/sender.c:3385 [picoquic_prepare_packet_ready]: Retransmission check caused a disconnect

huitema commented 10 months ago

Thanks for the details. The issue is indeed easy to reproduce by simulating total loss for a few seconds.

huitema commented 10 months ago

@TimEvens please check PR #1582 -- I believe that is solves our issue.

TimEvens commented 10 months ago

Tested and it's working as expected. Thank you for fixing so quickly.