private-octopus / picoquic

Minimal implementation of the QUIC protocol
MIT License
527 stars 156 forks source link

Packet repeat issue casued a hole in stream #1540

Closed dongsf closed 11 months ago

dongsf commented 11 months ago

I have a stream (number 0) that is active throughout the entire connection lifecycle. However, during the transmission, a hole appeared in stream 0. I think there was a problem with the server‘s retransmitting the packet. It may be caused by an incorrect confirmation of the frame. The log is too large, I only intercepted the portion of the server's log where the error occurred, packet_number is 2974364.

dongsf commented 11 months ago

a6c707c70ba6eb56.server.qlog.zip

dongsf commented 11 months ago

I have enabled preemptive mode, and the packet number 2974364 was sent in picoquic_preemptive_retransmit_packet. However, the frame data of the stream was skipped during the retransmission process.

dongsf commented 11 months ago

The packet's "was_preemptively_repeated" flag was set by picoquic_preemptive_retransmit_packet, but the frame data was not repeated. This is because we do not want to repeat stream data in preemptive mode. However, in a later process, the entire packet was skipped during the retransmission process.

huitema commented 11 months ago

I ill investigate. Preemptive mode could use more testing...

huitema commented 11 months ago

The logic is that stream data frames are only repeated if the stream FIN has been sent. The reasoning was that before that point, it is more efficient to use the bandwidth to send other streams, or more data on the same stream. These other streams will create packets that will trigger acks, etc. But there is indeed a bug.

If the packet contains multiple frames, say from stream 1 and 2, and we have only sent the FIN for stream 1, the frames from stream 1 will be repeated but the frames from stream 2 will not be. The same happens if the packet contains a stream frame for a stream that is not finished, and another frame that should be repeated -- could be for example MAX_DATA, or another control frame. In that case, setting the "was_preemptively_repeated" for the whole packet is wrong, causing the bug that you see.

I need to redesign this code, probably in two passes. The simplest would be to only perform preemptive repeat if all the frames in the packet can be repeated, make it all or nothing, only skipping frames that are already acked, or that are pure acks. So first check whether all frames could be repeated, and only then send the preemptive repeat.

huitema commented 11 months ago

@dongsf Please check PR #1541. I am out for the weekend, do not have time to write a specific text case that reproduces the hole in stream issue, but I think this PR fixes it.