Programmatic probing of new paths

levaitamas commented 3 months ago

@nemethf and I have been trying to migrate a client connection. It seems, based on the example code, one just need to rebind the client endpoint to a new UDP socket. Is this the recommended approach?

Furthermore, RFC 9000 allows the client to validate a path even before migrating the connection. In quinn is it possible to validate a path before rewriting the endpoint address? This would allow us to migrate only when the server can be reached from the new address.

Thanks!

Ralith commented 3 months ago

Migration is typically involuntary, e.g. due to a user switching networks. We don't currently have an API to spontaneously probe a new path, but it could probably be implemented. Can you elaborate on your use case for programmatic migration?

Ralith commented 3 months ago

Elaborating a little: typically you'd bind your client endpoint to a wildcard IP, which will cause involuntary migration whenever the host's default route changes, e.g. due to moving between wifi networks and/or a cellular modem.

nemethf commented 3 months ago

There is time for pre-validation when the user switches to a preferred network. For example, a wifi connection might be cheaper than cellular one, or a wired connection might be faster and more reliable than a wifi connection. Even switching from wifi to cellular can be scheduled if the application has access to the wifi signal strength.

We are experimenting with real-time traffic where delay jitter should be kept low. Hopefully, pre-validation results in a less noticeable connection migration.

Thanks again.

Ralith commented 3 months ago

I think this would be a reasonable feature for us to have. Maybe something on Connection like async fn probe_path(&self, source: &UdpSocket) -> Result<Something, TimeoutError>? Are you interested in building this feature?

nemethf commented 3 months ago

This feature seems quite complicated for us since we are new to quinn (and to some extent to rust as well.) Nevertheless, what do you think about the following high-level plan?

We should add a new dst_cid: ConnectionId field to PathData, and a new paths: Vec<PathData> field to Connection. Maybe the CidQueue should be modified as well, since it assumes there is only one cid is in use at a given time.

probe_path should first take one connection-id from rem_cids, and assign it to a newly created path. Then it somehow should send a PATH_CHALLENGE frame via source and the wait for a response. If the connection.process_payload() receives a PATH_RESPONSE, it should unblock probe_path, which then can return with a reference to the new path.

If a timeout occurs because process_payload does not unblock probe_path, then probe_path should delete the newly created path and it should retire its connection-id.

It should be the caller's responsibility to switch to the new path, or to delete it.

Ralith commented 3 months ago

This feature seems quite complicated for us since we are new to quinn (and to some extent to rust as well.)

No worries, and no pressure; it's just likely to be the fastest path to seeing it happen. We're always happy to answer questions and provide feedback.

a new paths: Vec<PathData> field to Connection.

I'm wary of extending Connection to track multiple concurrently valid paths in the scope of this task. That starts looking like #224, which is likely a much larger undertaking, albeit a desirable one. Off the cuff, I think a simpler approach might be to have a table of paths that we're actively probing (maybe just a Slab<ConnectionId>?), with entries being added on probe_path calls and removed on timeout or successful receipt. The application could then decide to attempt actual migration, or not, based on the results. A further refinement might be to pass through the probe results to the migration attempt (e.g. marking the path as already-validated, providing a better initial RTT guess, choice of CID) but that isn't required for a MVP.

I guess that's mostly a cosmetic difference from what you're proposing, but let's try to be clear in the code there's still a single primary path and we're only adding tracking for in-flight probes.

Maybe the CidQueue should be modified as well, since it assumes there is only one cid is in use at a given time.

Yeah. Per RFC9000 §9.5 we can't half-ass this too much:

An endpoint MUST NOT reuse a connection ID when sending from more than one local address -- for example, when initiating connection migration as described in Section 9.2 or when probing a new network path as described in Section 9.1.

This may require some care, as a failed probe will permanently consume a CID while we're still using an earlier-sequenced CID for the active path. The existing queue assumes that all earlier CIDs are retired and all later CIDs are fresh, but with failed probes we'll need to support a single(?) arbitrarily large gap between the current CID sequence number and the next fresh one. Further, we'll need to block probe attempts if no fresh CIDs are available. Probing may be impossible if the peer hasn't issued us enough CIDs.

probe_path should first take one connection-id from rem_cids, and assign it to a newly created path. Then it somehow should send a PATH_CHALLENGE frame via source and the wait for a response. If the connection.process_payload() receives a PATH_RESPONSE, it should unblock probe_path, which then can return with a reference to the new path.

This sounds good to me. quinn_proto::Connection::poll_transmit will need to check whether there are any probes pending and send one if so, much like existing logic to send PATH_CHALLENGE on the previous path to a client which has just migrated. poll_transmit will also need to somehow indicate which socket to send on -- perhaps by including an opaque ID allocated by quinn_proto::Connection::probe_path in the Transmit value returned, representing an index in the Slab of probing paths or similar.

If a timeout occurs because process_payload does not unblock probe_path, then probe_path should delete the newly created path and it should retire its connection-id.

It should be the caller's responsibility to switch to the new path, or to delete it.

Sounds good!

quinn-rs / quinn

Programmatic probing of new paths #1772