quicwg / multipath

In-progress version of draft-ietf-quic-multipath
Other
52 stars 18 forks source link

Choosing between a single packet number space vs. multiple packet number spaces #96

Closed qdeconinck closed 2 years ago

qdeconinck commented 2 years ago

This draft initially originates from a merging effort of previous Multipath proposals. While many points were found to be common between them, there remains one design point that still requires consensus: the number of packet number spaces that a Multipath QUIC connection should support (i.e., one for the whole connection vs. one per path).

The current draft enables experimentation with both variants, but in the final version we will certainly need to choose between one of the versions.

huitema commented 2 years ago

The main issues mentioned so far:

The picoquic implementation shows that "efficiency" and "ack size" issues of single space implementations can be mitigated. However, that required significant improvements in the code:

I think these improvements are good in general, and I will keep them in the implementations whether we go for single space or not. The virtual sequence number is for example useful if the CID changes for reasons not related to path changes in multiple number space variants. It is also useful in unipath variants to avoid interference between sequence numbers used in probes and the RACK logic. The ACK size improvements do reduce the size of ACKs in presence of out of order delivery, e.g., if the network is doing some kind of internal load balancing. On the other hand, the improvements are somewhat complex, would need to be described in separate drafts, and pretty much contradicts the "simplicity of code" argument.

So we are left with the "Null length CID" issue. I see for cases:

Client CID Sender CID Support Priority
long long Supported in both variants Used by many implementations
NULL long Requires special support in multiple spaces case, but could work Preferred configuration of many big deployments
long NULL Requires special support in multiple spaces case, but could work Rarely used, server load balancing does not work
NULL NULL Does not work for multiple spaces Only mentioned in some P2P deployments

The point here is that it is somewhat hard to deploy a large server with NULL CID and use server load balancing. This configuration is mostly favored by planned P2P deployments.

huitema commented 2 years ago

The big debate is for the configuration with NULL CID on client, long CID on server. The packets from server to client do not carry a CID, and only the last bytes of the sequence number. The client will need some logic to infer the full sequence number before decrypting the packet. The client could maybe use the incoming 5 tuple as part of the logic, but it is not obvious. It is much simpler to assume a direct map from destination CID to number space. That means, if a peer uses a NULL CID, all packets sent to that peer are in the same number space.

huitema commented 2 years ago

Revised table:

Client CID Sender CID What
long long Multiple number space
NULL long Multiple number spaces on client side (one per CID), single space on server side
long NULL Multiple number spaces on server side (one per CID), single space on client side
NULL NULL single number space on each side

If a node advertises both NULL CID and multipath support, they SHOULD have logic to contain the size of ACK. If a node engages in multipath with a NULL CID peer, they SHOULD have special logic to make loss recovery work well.

huitema commented 2 years ago

I think the above points to a possible "unified solution":

yfmascgy commented 2 years ago

I like the proposal of the "unified solution". I think the elegance lies in the fact that it allows us to automatically cover all four cases listed above. The previous dilemma for me was that on one hand we have some use cases where we need to support more than two paths and separate PN makes the job easier, but on the other hand, I think we should not ignore the NULL CID use cases as it is also important. Now, with this proposal, a big part of the problem is solved. The rest of challenge is to make sure single PN remains efficient in terms of ACK and loss recovery. On that part, we plan to do an A/B test and would love to share the results when we get them.

There is one more problem as pointed in issue #25 , when we want to take hardware offloads into account. In such a case, we may still need single PN for long server CID. However, if hardware supports nonce modification, this problem can be addressed with the proposed "unified solution".

huitema commented 2 years ago

If the APi does not support 96 bit sequence numbers, it should always be possible to create an encryption context per number space, using Key=current key and ID = current-ID + CID sequence number. Of course, that forces creation of multiple context, and that makes key rotation a bit harder to manage. But still, that should be possible.

mirjak commented 2 years ago

Thanks for the summary @huitema. I think one point is missing in your list which is related to issue #87. Use of a single packet number space might not support ECN.

Regarding the unified solution: I think what you actually say is that we would specify both solutions and everybody would need to implement both logics. At least regarding the "simplicity of code" argument, that would be the worst choice.

If we can make the multiple packet number spaces solution work with one-sided CIDs, I'm tending toward such an approach. Use of multiple packet number spaces avoids ambiguity/needed "smartness" in ACK handling and packet scheduling which, as you say above, can make the implementation more complex and, moreover, wrong decisions may have large impact on efficient (both local processing and on-the-wire). I don't think we want to leave these things to each implementation individually.

qdeconinck commented 2 years ago

The summary Christian made above about design comparison sounds indeed quite accurate. Besides ECN, my other concern about single packet number is that it require cleverness from the receiver side if you want to be performant in some scenarios. At the sender-side, you need to consider a path-relative packet number threshold instead of an absolute one to avoid spurious losses.

Just a point I think we did not mentioned yet is that there can be some interactions between Multipath out-of-order number delivery and incoming packet number duplicate detection. This requires maintaining some state at the receiver side, as described by https://www.rfc-editor.org/rfc/rfc9000.html#section-12.3-12. With single packet number space, the receiver should take extra care when updating the "minimum packet number below which all packets are immediately dropped". Otherwise, in presence of paths with very different latencies, the receiver might end up discarding packets from a (slower) path.

I'm also preferring the multiple packet number spaces solution for the above reasons. I'm not against thinking for a "unified" solution (the proposal sounds interesting), but I wonder how much complexity this would add compared to requiring end hosts to use one-byte CIDs.

huitema commented 2 years ago

I think the issue is not really so much "single vs multiple number space" as "support for multipath and NULL CID". As noted by Mirja, there is an implementation cost there. My take is:

Then add sections on what it means to deal with the side of acknowledgements, out of order arrivals, and congestion control.

I think this approach ends up with the correct compromises:

If we do that, we can get rid of the single vs multiple space discussion, and end up with a single solution addressing all use cases.

obonaventure commented 2 years ago

Looking at all the discussions here and in other issues such as ECN, I think that we should try to write two different versions of section 7:

There would be some overlap between these two sections and also some differences that would become clear as we specify them. At the end of this writing, we'll know whether it is possible to support/unify both or we need to recommend a single one. The other parts of the document are almost independent of that and can evolve in parallel with these two sections.

However, I don't think that such a change would be possible by Monday

huitema commented 2 years ago

I don't think we should rush changes before we have agreed on the final vision.

mirjak commented 2 years ago

It might be helpful to have these options as PRs (without merging) them, so people can understand all details of each approach.

yfmascgy commented 2 years ago

I agree with @huitema that supporting multi-path with null CIDs is a more fundamental issue than the efficiency comparison between single PN and separate PN, as it would ultimately impact the application scope of multipath QUIC. But indeed, we might want to implement proposed solution first and then decide if we want to adopt such a unified approach.

huitema commented 2 years ago

Since @mirjak prodded me, we now have a PR for the "unified" proposal.

Yanmei-Liu commented 2 years ago

I totally agree with Christian that the issue is about "support for multipath and NULL CID", and the solution that Christian suggested looks really great! It both takes advantage of multiple spaces, and support NULL CID users without affect the efficiency of ACK arrangements.

Besides, the solution is more convenient for implementations, because If both endpoints uses non-zero length cids, endpoints only need to support multiple spaces, and if one of the endpoints use NULL CID, it could use single pn space in one direction and could support NULL CID and multipath at the same time.

mirjak commented 2 years ago

Very high level summary of IETF-113 discussion seems that there is interest and likely support for the unified solution (review minutes for further details).