We should not require explicit MP_RETIRE_CID after PATH_ABANDON

huitema commented 1 month ago

The PATH_ABANDON is a commitment to not send data anymore on the specified path. Not sending data on the path implicitly means that the endpoint will not use any of the MP_NEW_CID sent by the peer for the abandoned path. De facto, that's equivalent to retiring these CIDs: they will not be used anymore. Sending explicit MP_RETIRE_CID is just overhead, or maybe worse than overhead.

In monopath QUIC, sending a RETIRE_CID is useful because it can trigger production by the peer of a NEW_CID. But in the PATH_ABANDON scenario, there is no point: if the peer produced more MP_NEW_CID frames, they will just be unused. Since there is no point, some implementations may be tempted to not do it. The endpoint could just silently drop the unused CID, and would not suffer any consequence. The peer, on the other hand, will be stuck with a list of "zombie CID" that have no utility and just consume memory. So, if we look at the endpoint that receives the path Abandon, its interest is to free resource asap, whether MP_RETIRE_CID frames are received after the path Abandon or not.

The main synchronization issue is the "stateless reset" risk. When an endpoint sends a PATH ABANDON, there maybe some packets in transit that will be delivered out of order. If the peer has deleted all knowledge of the path-associated CID, the CID in the out of order packets will not be recognized, and the peer will send a "stateless reset" packet.

We could specify a complex solution to eliminate the stateless reset risk, such as ask to retain knowledge of the old CID for 3*PTO after receiving the PATH_ABANDON. That will reduce the risk of generating stateless reset packets, but it will not eliminate it. But then, stateless reset packets only have an effect if they can be verified, if the last 16 bytes match the "Stateless Reset Token" of a valid CID for the connection. If the sender of the path abandon has freed all resource associated with the CIDs for the abandoned path-ID, it will also have deleted the corresponding stateless reset tokens. The stateless reset packet sent by the peer thus will not be recognized as a valid stateless reset, and will have no effect on the state of the connection.

Thus, I think that the proper behavior should be:

1) When sending a path abandon frame, immediately free the resource associated with the CIDs received from the peer for that path.

2) When receiving a path abandon frame, assume that it carries an implicit "retire CID" for all the CIDs sent to the peer, and free the corresponding resource after a short delay, maybe 3*PTO.

mirjak commented 1 month ago

@huitema how is this issue different from #313 ?

mirjak commented 1 month ago

@huitema if this is the same can you maybe copy your comment there and close this issue?

huitema commented 1 month ago

Copied.

quicwg / multipath

We should not require explicit MP_RETIRE_CID after PATH_ABANDON #367