private-octopus / picoquic

Minimal implementation of the QUIC protocol
MIT License
564 stars 166 forks source link

Reorganize retransmission list to handle paths #1235

Closed huitema closed 3 years ago

huitema commented 3 years ago

There are two potential issues with the "simple multipath" option: longish ACK carrying lots of ranges; and, lower efficiency of loss recovery because RACK does not work so well. The ACK size is relatively easy to fix, but the we also need work on loss recovery.

The root of the issue is that loss recovery assumes that packets are received in sequence with minimal reordering. This is used for example in the RACK algorithm. But this is not so easy to use with multipath, because the current data structure checks for losses using a list of packets ordered by sequence numbers. Having a sequence number space per path fixes the issue, but adds other issues such as more complex encryption -- with a need to get different high order bits of IV for each packet sequence number.

Possible solution is to maintain two organize two sets of chains of packets "waiting for ack":

To check for packet losses, one would check the oldest packet on each path. To run the exact equivalent of RACK, one would need a per path "sent counter", repeated in each packet context. Comparing that counter to the last received allows for RACK style decisions; managing timers using the path values allows for classic timer behavior.

Doing that can unify "monopath" and "simple multipath".

Bonus points if we can unify per-path and global sequence handling. Maybe an indication of the number space attached to the path?

Extra bonus if we can unify probe handling and loss detection.

huitema commented 3 years ago

The current code has several transmission lists:

1- Initial number space 2- Handshake number space 3- 1RTT number space 4- In the case of complex multipath, one per path or rather per connection ID

For each number space, there is a transmitted list, to manage acknowledgements, and a retransmitted list, to manage spurious re-transmissions.

Could this be simplified? It would be plausible to have just one number space overall, but this would break the optimization of acknowledgement ranges in the case of multipath. It would certainly be possible to allocate sequence numbers for Initial, handshake and 1RTT from the same number space.

It would also be possible to have a single list with a key encoding the number space and the sequence number, possibly using something like a splay for direct access. This would probably solve an efficiency problem when handling out of order acquittals, e.g., when managing multiple paths with a single sequence, but also dealing with out of order arrivals in the network.

If re-transmissions are managed per path, e.g. with a double chain, the dependency on sequence numbers diminishes. We just need to use the sequence number to find encryption contexts and ack contexts.

What if a path is deleted? Currently, re-transmission use the RTT of the default context. Should the packets be spliced into the default path? Should they be kept in a separate list? Should the ack-only packets just be deleted? Should the other packets be considered lost?

huitema commented 3 years ago

Can we unify the transmitted list and the retransmitted list? If the actual retransmission are managed via the per path lists, that should be possible. Is there a reason to manage a retransmitted list if the path is closed, and timers and congestion state don't need updates?

huitema commented 3 years ago

Coming back to the "single number space". What would be the consequence of picoquic using a single number space for all 1 RTT packets?

This will definitely enable some code simplification:

Overall, this removes dozens of branches from the code, making it more reliable. If there is no impact on multipath tests, this looks like a good simplification.

huitema commented 3 years ago

Remaining issue with the "single sequence": what if one link was used very rarely? For example, suppose we have already sent 2^33 packets on link 0, and then we start multipath with a new link. The packet header will have just 4 bytes of sequence number. The recipient has not received any packet yet, so whatever is sent becomes the initial number. This 32 bit number wont match the continuous sequence 33 bit number. AEAD decryption will fail.

Bottom line: better keep the code as is.

huitema commented 3 years ago

In fact, there is a much simpler solution than maintaining lists. The main issue is the need to apply the packet loss tests defined in QUIC Recovery, in which a packet is declared lost if:

In single path transmission, there is only one path and the packet number describes well the order in which packets were sent on that path. In multipath transmission, we cannot use the packet number directly, but we can instead use a "path packet number" that represents the order in which the packets are sent on the path. When a packet is acknowledged, we use the information and keep track of the latest path packet number acknowledged on the path. When checking whether an old packet is lost, we can compare the path packet number of the old packet and of the last acknowledged packet on the path, and consider the packet lost if the difference is larger than a threshold.

This almost solves the problem, except for the need to check whether there is a packet lost on all validated paths. If we have a single path implementation, we need a single queue of packets waiting to be acknowledged, and we can start examining that queue from the tail. But the tail will only inform us of the path with the highest latency...

huitema commented 3 years ago

What's the lowest complexity for maintaining the non-acked list? Do we really need a linked list per path?

What would change if we had a splay of non acked packets? Would that simplify management of acks? We could maintain for each path the "oldest non-ack", but we would have to update that number each time a packet is deleted. We could also maintain a global splay ordered by path and path sequence number, but is that really simpler?

huitema commented 3 years ago

The issue was caused by out of order transmission of ack ranges, which itself derived from two other issues:

1) Each range is sent only a limited number of times 2) Ranges may be packed in ACKs sent on different paths

If an ack range for recent packets was received before and ack range for earlier packets, RACK would kick in and declare the earlier packets lost, causing a large number of spurious loss detections.

This was fixed by a series of changes:

1) acknowledge older ranges first 2) send a copy of all acknowledgements on the low latency path 3) also send a copy on other paths, in case the low latency path go bad 4) check that loss of acknowledgements are detected, even on paths that only send acknowledgements

PR #1251 incorporates these changes, and results in significant performance gains for the multipath scenarios.