quicwg / multipath

In-progress version of draft-ietf-quic-multipath
Other
49 stars 17 forks source link

Should we disable CID rotation on opened paths when multipath is negotiated? #273

Closed yangfurong closed 9 months ago

yangfurong commented 10 months ago

The current draft (05) allows endpoints to change the CIDs of paths at any time. However, as each CID has its own PNS, this makes implementing MPQUIC more complicated and leads to sub-optimal performance in some cases.

In the following examples (Figure-1 & Figure-2), when NAT-rebinding happens on path-1, we have to create new PNSs that are associated with new CIDs, associate the new PNSs with path-1, and simultaneously maintain both the old and new PNSs for some time (the old PNSs should not be released immediately for performance consideration as there could be unacked/inflight pkts).

Even worse, if the client rotates the CID of path-1 after an idle period and NAT rebinding happens simultaneously, path-1 will experience a long timeout and be closed (Figure-3). If the client wants to resume the transmission on path-1, it has to reopen it.

The above issues are solved if we disable CID rotation on opened paths (Figure-4, Figure-5, and Figure-6): 1. No need to do the complicated management of multiple PNSs for each path; 2. CID rotation and NAT rebinding will never happen at the same time.

This actually simplifies the implementation of MPQUIC in practice. But, I am not sure if it will introduce other problems.

Figure-1 Figure-1

Figure-2 Figure-2

Figure-3 Figure-3

Figure-4 Figure-4

Figure-5 Figure-5

Figure-6 Figure-6

mirjak commented 10 months ago

The ability to change CID is a privacy feature. I think we want to retain that characteristic from QUIC. Unfortunately it makes things complicated but I don't think disability it is an option.

qdeconinck commented 10 months ago

+1 to Mirja.

yfmascgy commented 10 months ago

@yangfurong Thanks for pointing this out. Performance wise, the CID & port change at the same time could be a bummer. But I think the privacy concern and thus the ability to change CID should come first. Can we think about how can we fast detect the simultaneous CID & port change so we can quickly close the path in your figure 3?

yangfurong commented 10 months ago

@yangfurong Thanks for pointing this out. Performance wise, the CID & port change at the same time could be a bummer. But I think the privacy concern and thus the ability to change CID should come first. Can we think about how can we fast detect the simultaneous CID & port change so we can quickly close the path in your figure 3?

Perhaps, the client could send a PC as the first packet after rotating the CID of an existing path. If the client's address does not change, challenging the server's address has no side effect. But, if the client's address does change, the server will take this PC as a signal to open a new path.

yangfurong commented 10 months ago

@yangfurong Thanks for pointing this out. Performance wise, the CID & port change at the same time could be a bummer. But I think the privacy concern and thus the ability to change CID should come first. Can we think about how can we fast detect the simultaneous CID & port change so we can quickly close the path in your figure 3?

Perhaps, the client could send a PC as the first packet after rotating the CID of an existing path. If the client's address does not change, challenging the server's address has no side effect. But, if the client's address does change, the server will take this PC as a signal to open a new path.

In this way, the client and server can communicate over path-1 immediately after NAT-rebinding. But, they have different views on path-1.

huitema commented 10 months ago

The performance issues can be easily alleviated if the implementation is a little smart. The main issue regards loss recovery. A naive implementation will suffer because once a number space is "closed", it cannot anymore use the packet number logic of RACK to discover packet losses, cannot send a new frame on PTO, and will have to fall back to the less efficient "timeout". But it is not hard to tie the new path to the old one, and trigger the number logic based on acknowledgement of packets on the new path. It is also not hard to tie the new path to the old one, and use the old path congestion data to seed the values on the new path. If the implementation does that, there is no practical impact of rotating the CID.

kazuho commented 10 months ago

What @huitema says.

It would be rare for endpoints to rotate a CID without changing the egress path, as that is required only when the key used to encrypted the CID is renewed. And if an endpoint needs to rotate frequently, there can be tricks like as @huitema points out.

IIRC, multipath QUIC has inherited not only the security properties of QUIC v1 but also how the protocol deals with those security requirements; e.g., when to use new DCIDs, or when to send path probes. I think we should stick to what we already have and is known to work.

mirjak commented 10 months ago

Can we close this issue or is there anything we can do editorially to better explain how to "transfer" the old state to the new path?

michael-eriksson commented 10 months ago

I think that #214 is a good solution to this problem. The current specification is unclear, complex and (as described in the initial message above) implies:

  1. multiple packet number spaces over the same path with complicated, suboptimal special-case loss detection
  2. difficulty to detect/understand simultaneous NAT rebinding and CID update, which is likely after an idle period

The specification should be updated with a simple and clear path model (one single stable logical path per physical path) and explicit signalling (not even more semantic overloading of the PATH_CHALLENGE frame!) to enable clean, reliable and efficient implementations.

yangfurong commented 10 months ago

I think that #214 is a good solution to this problem. The current specification is unclear, complex and (as described in the initial message above) implies:

  1. multiple packet number spaces over the same path with complicated, suboptimal special-case loss detection
  2. difficulty to detect/understand simultaneous NAT rebinding and CID update, which is likely after an idle period

The specification should be updated with a simple and clear path model (one single stable logical path per physical path) and explicit signalling (not even more semantic overloading of the PATH_CHALLENGE frame!) to enable clean, reliable and efficient implementations.

+1

mirjak commented 9 months ago

Please continue discuss in issue #214. Closing this issue now.