quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.76k stars 380 forks source link

Allow app to accept/reject/retry before handshake begins #1752

Closed gretchenfrage closed 5 months ago

gretchenfrage commented 8 months ago

Closes #1661

Please see individual commit messages.

Ralith commented 8 months ago

Thank you for working on this! I've wanted this feature for a long time, and I'm excited about the amount of discretion it gives applications for practically no increase in intrinsic API complexity.

It may be a few days before I have time to dig into reviewing this properly, but I can see it's turned out to be a fairly large change. Any chance you could break it up into a series of smaller incremental, self-contained commits?

gretchenfrage commented 8 months ago

Thanks for the review! Splitting this into more incremental commits and making the other changes you mentioned makes sense to me. Just so you know, though, it might be a little bit until I can dedicate more time to this--but I'll get back to you when it's ready to be looked at again.

gretchenfrage commented 7 months ago

Should be ready for re-review now. I split this up into various commits which should make this a lot easier to see what I'm doing if you go through them one at a time, and made some of the other changes you requested too.

Why not expose ConnectingState directly?

I don't really see how this would work--at least, I'm not sure how to reconcile that idea with Ralith's request that this functionality be placed into the existing Connecting API. Which, although that wasn't my original proposal, I do like the idea of. If you have something in mind though I'd be happy to talk about it.

... because this gets pretty messy.

Yeah that's fair--I've refactored that area in general, it should be a lot cleaner now.

gretchenfrage commented 7 months ago

Should we introduce a new public future (perhaps Handshaking) returned by accept, and move this and into_0rtt there?

Pros:

  • Reduced side effects here and in into_0rtt
  • No need for an AcceptError in into_0rtt
  • Less risk of panics in is_validated/may_retry/reject/retry

Cons:

  • Duplication of local_ip/remote_address

Seems like a win to me. We could even get rid of the state enum entirely by implementing IntoFuture rather than Future to support direct await calls.

Alright, I've done a considerable refactor which I think achieves a good compromise between the various desires here, and results in a smaller PR previously. I'm no longer introducing different internal states to Connecting--in fact quinn::Connecting is completely untouched.

Awaiting accept on a quinn::Endpoint now returns a new quinn::IncomingConnection type. quinn::IncomingConnection has the following methods:

Furthermore, it implements IntoFuture<Output=Result<Connection, ConnectionError>>, the output type being the same as Connecting. This means that code which was just doing some basic endpoint.accept().await.unwrap().await (or similar) logic will still work without modification.

Ralith commented 7 months ago

I'm not sure how to feel about having both Endpoint::accept and IncomingConnection::accept which do rather different things. On the one hand, it's clear enough in context on each type, but in an abstract sense having to accept each connection twice is funny and might cause confusion in informal discussion: "Where did you see that error?" "After I accepted a connection.". Would IncomingConnection::connect be a clearer name? But that wouldn't match the proto layer...

Ralith commented 7 months ago

After sleeping on it I think the current naming is probably the sweet spot.

gretchenfrage commented 7 months ago

After sleeping on it I think the current naming is probably the sweet spot.

Huh alright. I mean I'd be fine renaming it to IncomingConnection::connect on both the quinn and quinn-proto layer but yeah leaving as-is works.

gretchenfrage commented 7 months ago

Could we simplify the API by returning Connecting unconditionally, but constructing it such that it immediately yields a ConnectionError when awaited if the proto-layer accept failed? That seems like it'd reduce the number of distinct error paths in downstream code, and reduce churn. For example, the change to the accept loop in quinn/examples/server.rs, which introduces a regression where accept errors aren't logged, would be avoided.

On second thought, that's just .await again. No harm in being a little more fine-grained with the explicit API, I suppose!

If it helps clarify the situation: when changing server applications, I suspect what might end up being done most naturally in most cases is just changing functions which accept a Connecting to instead accept an IncomingConnection but then awaiting it all the same. See for example the entire diff to perf_server.rs:

diff --git a/perf/src/bin/perf_server.rs b/perf/src/bin/perf_server.rs
index 169ca65e..4b710268 100644
--- a/perf/src/bin/perf_server.rs
+++ b/perf/src/bin/perf_server.rs
@@ -123,7 +123,7 @@ async fn run(opt: Opt) -> Result<()> {
     Ok(())
 }

-async fn handle(handshake: quinn::Connecting, opt: Arc<Opt>) -> Result<()> {
+async fn handle(handshake: quinn::IncomingConnection, opt: Arc<Opt>) -> Result<()> {
     let connection = handshake.await.context("handshake failed")?;
     debug!("{} connected", connection.remote_address());
     tokio::try_join!(

Thus it's the new IncomingConnectionFuture struct that implicitly does the work of acting as a little adapter which can be used as if it were "returning Connecting unconditionally, but constructing it such that it immediately yields a ConnectionError when awaited if the proto-layer accept failed"

gretchenfrage commented 7 months ago

I re-added the RetryPolicy callback to the quinn-proto testing utilities, due to me re-added the quinn-proto reject_remote_address test (now renamed to reject_manually). With regards to your previous comment on this topic:

Could we reduce complexity by just taking an IncomingConnectionResponse directly here? No downstream code seems to need arbitrary logic.

It would not actually be that simple, because RetryPolicy::yes() is not as simple as unconditionally returning IncomingConnectionResponse::Accept--it only does so if the IncomingConnection is not validated.

gretchenfrage commented 7 months ago

All comments should be resolved, and I just refactored the commit history and force-pushed again so it should be more clean now.

gretchenfrage commented 7 months ago

Yaaaaay

gretchenfrage commented 6 months ago

Thank you! I will be able to address these points soon.

gretchenfrage commented 6 months ago

Took me a bit longer than expected to find the time but it should be ready for another look now. Also changed:

lijunwangs commented 6 months ago

Hi @djc @Ralith, any plan on getting this into main? I found this capability very useful.

djc commented 6 months ago

Yes, will be merged soon I think.

gretchenfrage commented 6 months ago

Just rebased over main, resolving some non-trivial merge conflicts with 95ac70d6

gretchenfrage commented 6 months ago

There were some non-trivialities to removing concurrent_connections. I took the liberty of refactoring some things a tiny bit. Let me know if you disagree with these decisions.

Updated branch to remove concurrent_connections from ServerConfig. Adds open_connections(&self) -> usize method to Endpoint which user can use to enforce concurrent connection limit manually. Also, now removes existing reject_new_connections method on endpoint, as that relies on concurrent_connections and also can now just be done by user manually with the new "incoming" API we're adding.

Previous version of this branch added ConnectionError::ConnectionLimitExceeded variant. New version instead adds ConnectionError::CidsExhausted variant. In previous version of this branch, check_connection_limit method was factored out and called when accepting connections. It did two things: check concurrent connections, and check is_full, and cause ConnectionLimitExceeded error if either case true. In new version of this branch, we rename is_full to cids_exhausted and just directly call it when accepting connections and cause CidsExhausted error if true.

On main, ConnectError has TooManyConnections variant, with comment "The number of active connections on the local endpoint is at the limit." It is not triggered if concurrent_connections limit is reached (which I think is confusing with respect to the error name and message), but rather is only triggered if is_full returns true. We rename ConnectError::TooManyConnections to ConnectError::CidsExhausted so it looks basically the same as ConnectionError::CidsExhausted.


Other changes I made to this branch:

Ralith commented 6 months ago

I went through and s/reject/refuse/ in the API for consistency w/ the spec and general IETF convention, now that we no longer need to worry about not matching fn reject_new_connections.

lijunwangs commented 5 months ago

What are the remaining outstanding issues? Can we get this merged into main if no more issues? @djc @Ralith

gretchenfrage commented 5 months ago

Made requested tweaks and rebased over main.

gretchenfrage commented 5 months ago

Tests passed, so should be ready to merge I think!

lijunwangs commented 5 months ago

Tests passed, so should be ready to merge I think!

@gretchenfrage , @djc there are some unresolved conversations -- I think we just need to resolve them. I do not see any real outstanding issues.

djc commented 5 months ago

@lijunwangs there is no need to keep litigating this -- please assume I'm well aware of the state.

djc commented 5 months ago

@gretchenfrage many thanks for all your work on this, and sorry for my slow feedback cycle(s).

lijunwangs commented 5 months ago

@djc -- sorry for pushing for this -- it is a critical improvement for our use case. Thank you all!