Open FlorianUekermann opened 1 year ago
A PR should probably cover #144 as well.
Thanks for this issue. I can see the Problem.
A + B:
I think it is worth to explore the possibility of just one poll
method to “accept” streams.
And especially the idea of a Close
Enum sounds promising to me.
C. Stream errors It should be defined whether connection errors are picked up and processed via stream interactions the same way as they are if they surface via Connection methods. I think there is no point in doing so. It just complicates the code. If a stream error implies that the connection terminated, the connection level method(s) should still be required to emit the respective Close.
I'm afraid I didn't quite understand what you meant with this. Could you give me an example?
The emission order of streams of same type should be defined as ascending by id. It keeps the h3 side simple and most Quic implementations guarantee this anyway. If they don't it is trivial to fix in the trait implementation due to underlying guarantees in Quic. There have been examples of streams being emitted in slightly shuffled order by Quic implementations, so this should be explicitly required or handled internally.
I have not thought about this. To what extent could that be a problem?
There's a bit of redundancy in the OpenStreams and Connection trait, which is unnecessary and has a few awkward side-effects. https://github.com/hyperium/h3/pull/173 deals with that. It is largely orthogonal to this topic, but would be nice to get out of the way, because it makes (Draft) PRs for this issue a bit easier to read.
Thanks for that PR. I hope to have time to review that this weekend.
This are just my thoughts. I'm interested in getting other people's perspectives on this.
C. Stream errors It should be defined whether connection errors are picked up and processed via stream interactions the same way as they are if they surface via Connection methods. I think there is no point in doing so. It just complicates the code. If a stream error implies that the connection terminated, the connection level method(s) should still be required to emit the respective Close.
I'm afraid I didn't quite understand what you meant with this. Could you give me an example?
Stream errors can include info about connection closure (e.g. "read failed because connection was implicitly reset due to receiving close frame"). I'm suggesting that h3 should explicitly document whether it'll pick up on the fact that a connection is terminated from an error returned by a quic stream method. This will presumably be obvious from the types in the new traits, but atm it isn't (and it does matter in my example, where Quic must not return Ready(None)
from certain methods until we know h3 has understood that the connection terminated).
The emission order of streams of same type should be defined as ascending by id. It keeps the h3 side simple and most Quic implementations guarantee this anyway. If they don't it is trivial to fix in the trait implementation due to underlying guarantees in Quic. There have been examples of streams being emitted in slightly shuffled order by Quic implementations, so this should be explicitly required or handled internally.
I have not thought about this. To what extent could that be a problem?
Well, confusing the control stream with another one would be really bad for example.
Edit: The RFC says: Each side MUST initiate a single control stream at the beginning of the connection
, whether that implies that the control stream is the first one being initiated isn't 100% clear to me. Either way, we should probably not rely on it.
To send GOAWAY frames which limit the number of requests in flight it is also necessary to know which stream ids we haven't seen yet. They need to come in in ascending order for the bookkeeping to be trivial (remember max id).
It's just something to be documented. No implementation will struggle to solve this, because in Quic this order is enforced. So it's just about the quic implementation not accidentally shuffling them if multiple streams are opened between calls to poll_accept_*
.
Thanks for the quick response. It's good to know that you are in principle open to this. The big step is to start being precise about how termination and close info flows from quic to h3. The details can be iterated easily and decisions will be more obvious with code to look at.
It's been a while. Is there a realistic chance of making progress soonish? I see 3 reasons for urgency:
Adding to this we have run into limitations with the current traits when implementing webtransport #183.
Now Webtransport is merged and there have been proposals for additions to the API which find further blockings on the api and need to extend it and create Ext traits.
The current situation seems to be in a position to get unwieldy if we keep stacking things ontop of the current traits in my opinion.
I implemented A.2.i in #200 to see how it could work out. Is it like you proposed? What do you think?
Big picture problem
I believe there are a few issues with the
h3::quic::Connection
trait, which lead to expected behavior being under-defined and some hidden bugs in h3, which don't surface in tests due to h3-quinn implementation decisions. The topic easily gets out of hand, because the space of possible interpretations of "correct", as well as the solution space is large, so I'll break it up a bit and leave some stuff out.Example
Possibly a bug, subject to interpretation: https://github.com/hyperium/h3/blob/da29aea305d61146664189346b3718458cb9f4d6/h3/src/connection.rs#L244-L257 The
Poll::Ready(None)
arm implies that a connection closure is not expected when this is called. Yet, this snippet is actually almost certainly where connection close is observed for the first time. But h3-quinn treats any close as an error, even non-error ones. Therefore the arm that is selected is actually thePoll::Ready(Err(_))
inside the?
in L245, which is why no tests have caught this.If you use other quic implementations, which return
None
on non-error closes, graceful shutdown tests start failing. I think this is a tip-of-the-iceberg situation, because I observe other shutdown related oddities, depending on similar details, which I don't want to get into here for brevity. Even if you do know what h3 actually expects, I believe there are race conditions, for example: Observing control stream reset is never expected , but implied by a quic connection close.Underlying problem
These methods are the meat of the quic traits. https://github.com/hyperium/h3/blob/da29aea305d61146664189346b3718458cb9f4d6/h3/src/quic.rs#L45-L71 Especially taking into account the comments explicitly stating that
None
is a way to indicate a close. There is a good amount of ambiguity in where connection closes should be communicated first and how. From a naive reading, even a first error surfacing via a RecvStream method seems reasonable, growing the list of affected methods even more. With the current implementation, there doesn't seem to be a clear distinction between non-error closes and other terminations in the quic traits, which should be handled differently. I think the example demonstrates that these issues not only make life hard for implementers of the traits, but also for contributors to h3.Small detour to increase the solution space
Neither Quic nor Http3 provide a way to accept/reject opening streams/requests by the remote. In Http3 the stream can be closed and stopped immediately with an appropriate error code to indicate rejection after the fact, but opening a stream/request is a unilateral decision by the remote. As a result, the differentiated
h3::quic::Connection::poll_accept_*
methods aren't particularly useful. At least with quinn, calling "accept" has no stream concurrency implications. Other Quic implementations may choose a different approach, but without more explicit ways to apply backpressure on remote stream creation I'm not sure that would even be desirable.As a result, it would be an option to merge the
h3::quic::Connection::poll_accept_*
methods to something like this:The
Result
vsOption<Result>
andError
type considerations aren't the point here, see next section for that stuff.Solutions
As mentioned above the solution space is large, but there are some key choices:
A. Connection trait complexity
poll_open_*
), which emits streams, as well as connection errors and close information. This should either:Option
, but require explicit communication of a final close or error and returnPoll::Pending
instead ofNone
:fn poll(&mut self) -> Poll<Result<RecvOrBidiStream, Close>>
fn poll(&mut self) -> Poll<Option<RecvOrBidiStream>>
, withNone
implying connection termination, and an extrafn closed(&mut self) -> Option<Close>
.B. Close type Above a
Close
type is mentioned. Currently the equivalent isBox<dyn Error>
and maybeOption::None
. Note: Application closes are an expected way to lose connection, even if no GOAWAY was sent. Other closes are not and may indicate a problem that the application needs to report or react to. Therefore they should be clearly distinguishable. Options:Close
enum, either custom or justResult<ApplicationClose, ConnectionError>
.ConnectionError
would be constrained likeError
is atm. I'm using this on the quic side and it is very nice. The content of ApplicationClose is fairly well defined by the Quic RFC. Benefit: Meaning is obvious to the user if propagated, and obvious to the implementer of the quic side.C. Stream errors It should be defined whether connection errors are picked up and processed via stream interactions the same way as they are if they surface via Connection methods. I think there is no point in doing so. It just complicates the code. If a stream error implies that the connection terminated, the connection level method(s) should still be required to emit the respective
Close
.Additionally, the RFC does not require any stream error to have connection level effects, but allows implementations to choose to handle them that way. Again, I don't think that would help simplicity.
E. Stream order The emission order of streams of same type should be defined as ascending by id. It keeps the h3 side simple and most Quic implementations guarantee this anyway. If they don't it is trivial to fix in the trait implementation due to underlying guarantees in Quic. There have been examples of streams being emitted in slightly shuffled order by Quic implementations, so this should be explicitly required or handled internally.
E. Redundancy There's a bit of redundancy in the
OpenStreams
andConnection
trait, which is unnecessary and has a few awkward side-effects. #173 deals with that. It is largely orthogonal to this topic, but would be nice to get out of the way, because it makes (Draft) PRs for this issue a bit easier to read.Conclusion
I hope the above makes sense. And I hope my concerns aren't based on a series of misunderstandings on my side. I think there's potential for h3 code to become simpler and easier to reason about. A draft PR should be fairly straightforward and I'm happy to give it a shot. But with a change that is this substantial, I wanted to explain my perspective a bit and check if you are open to such changes.