quicwg / multipath

In-progress version of draft-ietf-quic-multipath
Other
48 stars 17 forks source link

Should the seq number space of MP_NEW_CONNECTION_ID be per-path or global? #371

Closed mirjak closed 2 weeks ago

mirjak commented 1 month ago

Currently the sequence number space for connection IDs is specified per path. However, this is not necessarily needed. We could also use one sequence number space for all CIDs which would be a smaller change compared to RFC9000. Specifically this would mean that we don't need MP_RETIRE_CONNECTION_ID but could use RETIRE_CONNECTION_IS instead.

The only problem is that "retire prior to" in the MP_NEW_CONNECTION_ID frame would not work that easily anymore. However, we could simply define that this value is still only applies per path. E.g. if I issue CIDs 2,3,5 and 8 for path 1 and CIDs 4,6, and 7 for path 2, "retire prior to"=5 for path 1 would retire 2,3,5 but not 4. Alternatively we could change the "retire prior to" field in the MP_NEW_CONNECTION_ID frame and make it a list instead.

huitema commented 1 month ago

It can be one or the other. If CID sequence numbers are global, then "retire prior to" should also be global. Which would make sense for the main use case of "retire prior to", which is tied to encrypted CID.

In server farms, the allocation of CID must be synchronized with the load balancer. One simple way to do that is to include the server ID in the content of the CID. Doing that in clear text is risky, because leaking the server ID allows some tracking of connections across CID, and also enables DDOS of specific servers in the farm. The norm is to encrypt it, but encryption keys will need to be rotated, and when we do that whole batches of CID have to be retired. This is what "retire prior to" was designed to handle, and that works better with global CID sequence numbers than with per path CID sequence numbers. So, +1 for global numbers, and global "retire prior to".

I don't think we have a specific use case for "per path" "retire prior to". The main reasons to clear a batch of CIDs in a farm is the closing of the path, but if we want fewer messages we can just make that implicit with "path abandon".

We already have global constraints for connection ID: they must be unique for the whole connection, in fact for the whole set of connections sharing a UDP port. Having the sequence number global will probably help. However, we will have to address one error mode. What happens if the peer sends 2 new CID frames with the same CID value, the same sequence number, and different path ID? I think this would have to be discussed, and explicitly forbidden.

michael-eriksson commented 1 month ago

It should per-path sequence number spaces for connection IDs.

Reasons:

Having per-path sequence number spaces for CIDs is a smaller change from RFC 9000, since all CID handling remains local per path. Global CID handling would be added complexity for no gain at all.

Furthermore, per-path sequence number spaces have already been implemented and interop tested at IETF 119. I see very little reason to change that now...

mirjak commented 1 month ago

It's actually a good point that with the explicit path ID, it is not possible anymore to have situation where the other end retires a CID and you don't have a valid CID to send on an open paths. That was previously actually the case when NEW_CONNECTION_ID and one CID space for all path was used and we had some text to cover that case. I recreated PR #374 to remove the respective text.

However, having that said, if one sequence number space would be used we of course would need to find a solution that addresses both of the point @michael-eriksson mentions above but I think it would be possible to find a different design for the MP_NEW_CIDS frame that would support that. I'm not saying we should be doing one or the other, just saying both should work.

At @huitema point: I think we anyway should to require that CIDs are unique per connection (and not only per path). This rules out any risks of likability or confusing in case of migration.

mirjak commented 1 month ago

As a side note: I came up with this question because we currently are using a global sequence number space for PATH_AVAILABLE/STANDY. We need to clarify this in the draft (see issue #312) but we could also change that to use a per-path space which would mean simply slightly more state on the sender side but slightly less state on the receiver side. Not sure if there is a value in using the same approach (either per connection or per path) in both cases...?

michael-eriksson commented 1 month ago

The draft says:

The receiver of the PATH_{STANDBY,AVAILABLE} frame needs to use and compare the sequence numbers separately for each Path ID.

The receiver won't really care how the monotonically increasing sequence numbers are generated, but since the sequence numbers are only compared per path, it would be most natural with per path sequence number spaces.

Looking at the Rask implementation, the sender side is simpler if the sequence number space is per path. In particular, it's easier to know if a lost path status frame should be retransmitted: just check if the sequence number is still the highest.

Yanmei-Liu commented 1 month ago

This issue is mainly related with issue #332 . I agree with Michael that the sequence number space of CID need to be per path:

huitema commented 2 weeks ago

The current text is actually safer. If we had a connection-wide sequence number of connection-ID, then a single MP_NEW_CONNECTION_ID could retire all CID used by other paths, leading to "zombie paths" that are defined and have no connection ID. The current text forces doing this per path, which ensures there is always at least one CID per path after the MP_RETIRE_CONNECTION_ID.

mirjak commented 2 weeks ago

Authors discussed to not change this.