Closed honzasp closed 7 months ago
Thanks for the report! Packets are removed from sent_packets
when they are acknowledged or deemed lost. This mechanism is not sensitive to the contents of packets, including whether they contain stream or datagram data. Have you verified that sent_packets.len()
grows without bound?
Thanks for your prompt response! I added a debug print of self.sent_packets.len()
to PacketSpace::sent()
and it grows without bound in the client (but not in the server).
I also tried using Quinn from the latest main
branch, with the same result.
Interesting; thanks for confirming. I'll try to have a look this weekend.
The SentPacket
entries accumulating on the client side, i.e. on the receiver (I missed that at first), are ACKs. Because the client never sends anything ACK-eliciting (i.e. application data) itself, the sender is never technically required to transmit its own ACKs, without which the client does not discard state. We should transmit ACKs occasionally even when not required to. We used to, but broke it in https://github.com/quinn-rs/quinn/pull/1415. We should also defensively forget ACK-only packets that accumulate beyond a threshold.
As an immediate mitigation, you can arrange for an endpoint which is only receiving data at the application layer for long periods to occasionally send something. For example, in the repro, having the client send one datagram for every 1000 it receives prevents state accumulation.
Endpoints acknowledge all packets they receive and process. However, only ack-eliciting packets cause an ACK frame to be sent within the maximum ack delay. Packets that are not ack-eliciting are only acknowledged when an ACK frame is sent for other reasons.
When sending a packet for any reason, an endpoint SHOULD attempt to include an ACK frame if one has not been sent recently. Doing so helps with timely loss detection at the peer.
An endpoint that is only sending ACK frames will not receive acknowledgments from its peer unless those acknowledgments are included in packets with ack-eliciting frames. An endpoint SHOULD send an ACK frame with other frames when there are new ack-eliciting packets to acknowledge. When only non-ack-eliciting packets need to be acknowledged, an endpoint MAY choose not to send an ACK frame with outgoing frames until an ack-eliciting packet has been received.
An endpoint that is only sending non-ack-eliciting packets might choose to occasionally add an ack-eliciting frame to those packets to ensure that it receives an acknowledgment; see Section 13.2.4
A receiver that sends only non-ack-eliciting packets, such as ACK frames, might not receive an acknowledgment for a long period of time. This could cause the receiver to maintain state for a large number of ACK frames for a long period of time, and ACK frames it sends could be unnecessarily large. In such a case, a receiver could send a PING or other small ack-eliciting frame occasionally, such as once per round trip, to elicit an ACK from the peer.
Thanks for the minimal repro! That made diagnosing this very straightforward.
Fix drafted in https://github.com/quinn-rs/quinn/pull/1761.
Thank you very much for the quick diagnosis and the suggested workaround :pray:
A big part of why we took so long to notice this is that if you're ever receiving stream data, you automatically send flow control updates occasionally that are themselves ACK-eliciting. State accumulation as demonstrated here can only occur when a connection is exclusively receiving application datagrams.
This was fixed in #1761.
I have a Quinn client and server, where the server sends a stream of datagrams to the client. I found that the client is leaking memory, and memory profiling with Heaptrack has shown that the culprit is the
BTreeMap::insert()
call inquinn_proto::connection::spaces::PacketSpace::sent()
, called fromPacketBuilder::finish_and_track()
. Here is a screenshot from Heaptrack:Here is a minimal reproducing example: https://github.com/honzasp/quinn_leak
It seems that packets are added to
PacketSpace::sent_packets
and then never removed. Perhaps because the data is sent using datagrams and not via streams?