Closed thomaseizinger closed 2 years ago
I'd like to discuss again the interface of the StreamMuxer
trait, in particular the changes done in #2724.
I am sorry for raising this now and not in the PR itself, but I only now started working on adapting the QuicMuxer
(#2289) to that interface change.
In the QUIC muxer we need a way of driving the inner quinn_proto::Connection
. Until now, this was done in the StreamMuxer::poll_event
loop (code), but that method was removed in #2724. We now have the following options:
StreamMuxer::poll_inbound
, which more or less allow us to do the same as poll_event
before. But imo this would be a hack/ intuitive, or at least the name would then not be that fitting anymore. QuicTransport
we drive the quinn_proto::Endpoint
in a separate taskin the background. We could drive the Connection
s in this same task and use channels or a Mutex to allow the QuixMuxer
to interact with the Connection
. In the former case (using channels) the Substream
implementation would then be rather complicated. The latter case (using Mutex) is imo also not ideal because it is easy to loose the overview of what entity accesses it when. Also, some of the events returned when polling the Connection
are relevant for the QuicMuxer
.StreamMuxer
interface to follow the same design as Swarm
or Transport
: One poll
method and synchronous methods open_outbound
and close
. Similar to how it was initially proposed in #2648.I am in favor of the 3rd option. The StreamMuxer
would then look roughly like this:
pub trait StreamMuxer {
type Substream: AsyncRead + AsyncWrite;
type Error: std::error::Error;
.
fn close(&mut self) -> Result<Self::Substream, Self::Error>;
fn open_outbound(&mut self, open_info: Self::Info) -> Result<SubstreamId, Self::Error>;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<SubstreamEvent<Self::Substream>, Self::Error>>;
}
pub enum SubstreamEvent<S> {
Substream {
stream: S,
id: SubstreamId
},
AddressChange {..}
}
I am aware that there was already a discussion in https://github.com/libp2p/rust-libp2p/pull/2724#issuecomment-1165320603 about this and that it was the continuous decision for the current design (e.g. to also allow backpressuring inbound substreams). Again, sorry for not engaging in that discussion back then.
I do like the new API in theory, but after working on the QuicMuxer
I think in practice it is rather inconvenient for the implementer.
Therefore I'd like to reopen the discussion. The following points are the reason why I am in favor of having one poll
method:
poll_inbound
feels like a hack, and would then also mean that we can not just stop polling poll_inbound
to enforce backpressure. async
methods and instead drive futures in the poll
function? @melekes do you have an opinion on what design would work best for the WebRTC muxer implementation?poll
method would, as @thomaseizinger already wrote in the original PR #2648, follow the style "mutate state and return result via poll", which I personally think is a good design to enforce throughout rust-libp2p StreamMuxer::poll_outbound
is imo not that intuitive: We create a future for a new outbound substream and poll that future. If already one exists we just poll the existing one. Implementations (e.g. Yamux or WebRTC) have to solve this by using an Option
, but in general this logic is rather implicit. Just looking at the API and not at other existing Muxers,, I would not know how to implement it.@thomaseizinger @mxinden what do you think? Are there drawbacks that I did not consider or any other reasons why you prefer to current design? Also happy to hear alternative ideas on how / where to drive our inner connection in Quic. @kpp Since you worked on the QuicMuxer before, maybe you could also give some input on what design you think would work best?
Thanks for commenting!
I did think about this problem actually but perhaps the trade-offs play out differently then what I thought they would. The problem I see with a single poll
function is that it creates an API that is actually rather inconvenient to use, esp. when we take into account the requirement of wanting to enforce back-pressure on incoming streams.
The current API is IMO rather simple because it is flat. It also simplified things quite a bit in swarm::Connection
.
Assuming we want to keep the current API, then I think the way to implement this for QUIC is by copying what was done in mplex: Drive the inner connection on every poll
implementation and buffer events (to a certain limit) that are not the ones you "need" in the current poll function.
If we want a different API, then I'd suggest the following:
pub trait StreamMuxer {
type Substream: AsyncRead + AsyncWrite;
type Error: std::error::Error;
.
fn close(&mut self) -> Result<(), Self::Error>;
fn open_outbound(&mut self) -> Result<SubstreamId, Self::Error>;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Result<SubstreamEvent<Self::Substream>, Self::Error>>;
}
pub enum SubstreamEvent<S> {
OutgoingSubstream {
stream: S,
id: SubstreamId
},
IncomingSubstream(S),
AddressChange {..}
}
What is not to underestimate is the boilerplate that comes with this ID management which has to be duplicated in every muxer implementation.
I only briefly looked into the webRTC muxer implementation (https://github.com/libp2p/rust-libp2p/pull/2622), but maybe this would also allow them to get rid of their async methods and instead drive futures in the poll function? @melekes do you have an opinion on what design would work best for the WebRTC muxer implementation?
I'm quite happy with the current design of StreamMuxer
trait.
I believe I should try to move QuicMuxer to a new API to give it a try.
When discussing the integration of the new interface in quic, @elenaf9 mentioned that one theoretical issue that we currently have is that one of the poll
functions MUST be polled in order for any of the substreams to make progress.
In practise, poll_address_change
will always be polled unconditionally, even when we start to enforce back-pressure on poll_inbound
and no outbound streams currently need to be opened. This is very subtle though which is why I am proposing the following:
poll_address_change
to poll_event
StreamMuxerEvent
that for now only has one variant AddressChange
This function can act as a general-purpose "make progress" function and all other poll
functions are only called on-demand. Thoughts? @elenaf9 @mxinden @kpp @melekes
This is very subtle though which is why I am proposing the following:
- Rename poll_address_change to poll_event
- Introduce a StreamMuxerEvent that for now only has one variant AddressChange
This function can act as a general-purpose "make progress" function and all other poll functions are only called on-demand.
Sounds good to me.
For the record:
In an out-of-band discussion with @thomaseizinger we decided to for now stick with the current interface. Main reason is that this interface allows to backpressure inbound substreams via poll_inbound
, and because we found a way to adapt the QuicMuxer
to it. More or less what @thomaseizinger described above by driving the inner connection on every poll implementation (and we do not even need to buffer any events). Edit: See elenaf9#6.
@kpp
I believe I should try to move QuicMuxer to a new API to give it a try.
See https://github.com/kpp/rust-libp2p/pull/19#issuecomment-1203998257.
Thank you. I will take a look
When discussing the integration of the new interface in quic, @elenaf9 mentioned that one theoretical issue that we currently have is that one of the
poll
functions MUST be polled in order for any of the substreams to make progress.In practise,
poll_address_change
will always be polled unconditionally, even when we start to enforce back-pressure onpoll_inbound
and no outbound streams currently need to be opened. This is very subtle though which is why I am proposing the following:* Rename `poll_address_change` to `poll_event` * Introduce a `StreamMuxerEvent` that for now only has one variant `AddressChange`
This function can act as a general-purpose "make progress" function and all other
poll
functions are only called on-demand. Thoughts? @elenaf9 @mxinden @kpp @melekes
There is now a draft PR of the above idea here: https://github.com/libp2p/rust-libp2p/pull/2797
Something I stumbled across just now: in the swarm, we never poll StreamMuxer::poll_close
when closing a connection. Instead we simply drop the future returned from StreamMuxerExt::close
, and then the connection (including the muxer) itself:
https://github.com/libp2p/rust-libp2p/blob/8931860b2f35e55316860a5fc0affb264c447c61/swarm/src/connection/pool/task.rs#L237-L249
(The above match block is always called when the connection closes, including graceful closing e.g. because of KeepAlive::No
).
Before #2248 the returned future was driven to completion before dropping the connection (code).
@mxinden since you were the author of that PR: was this an intentional change?
If so, I don't think having (an async) StreamMuxer::poll_close
makes any sense.
I think there is a confusion here. We close a connection either due to (1) the command_receiver
returning a Command::Close
or (2) due to the connection returning an error.
command_receiver
returning a Command::Close
In this case we drive the closing_muxer
returned by Connection::close
to completion.
In this case we don't drive the closing_muxer
to completion. The connection returned an error, thus there is need to gracefully close the connection.
@elenaf9 does the above make sense?
Ah thanks, I did overlook case Nr (1). Though I do wonder:
(2) connection returning an error
In this case we don't drive the closing_muxer to completion. The connection returned an error, thus there is need to gracefully close the connection.
The connection also returns an error Error::KeepAlive
if all network behaviours reported that they don't need the connection anymore (code). I would say 99% percent of the time, that's the reason why we close a connection in rust-libp2p. I would argue that in that case it still makes sense to gracefully close the connection.
We could at least try to close it and ignore any io::Error
returned from poll_close
. We may have to be careful though as this may trigger panics in the form of "polled after completion", depending on the muxer implementation.
Example for such "polled after completion": https://github.com/libp2p/rust-libp2p/issues/2598
That said, I think you are right @elenaf9. In the case of the handler, not the actual I/O resource (connection), returning an error (e.g. Error::KeepAlive
) we should try to close the connection. Good catch. I have not thought about that.
I wonder whether we should fix this earlier, or first combine HandlerWrapper
and Connection
and then fix this issue.
I wonder whether we should fix this earlier, or first combine HandlerWrapper and Connection and then fix this issue.
Imo it's sufficient if we do it after / within the PR that combines HanlderWrapper
and Connection
. From #2796 I assume that this is planned for the near future.
I am closing this as completed :partying_face:
Very good work! The amount of small pull requests working towards a greater goal is stunning. Thanks for making it easy to review.
This issue tracks the redesign of the
StreamMuxer
trait which was originally started in #2648.The primary goals are:
AsyncRead
+AsyncWrite
&mut self
for theStreamMuxer
(This may extend toPin<&mut Self>
)PRs related to this issue
StreamMuxerEvent
toEvent