Open Diggsey opened 3 years ago
I agree that this is a problem. The design of tide and async-h1 requires having multiple clones of the AsyncRead+AsyncWrite around, because both the request and response need access to it, in an indeterminate order
We tried implementing async-h1
without be Clone
bound because we indeed never concurrently access the underlying stream — but this requires GATs to express in the language. Without that it's mostly trading off workarounds.
One thing I'm working on right now is to make it so std::sync::Arc<T>
implements Read
or Write
if &T: Read
or &T: Write
. This can already be done through workarounds, but having it directly from std would make this easier to do — and would make the workaround for GATs we're relying on in async-h1
easier to use.
The design of tide and async-h1 requires having multiple clones of the AsyncRead+AsyncWrite around, because both the request and response need access to it, in an indeterminate order
Well... concurrent access to the reader and writer sides makes sense. Would the order still be indeterminate if those two sides were split? ie. I would expect the reader side to have a determined order (read the request header, read the request body, repeat) and the writer side to have a determined order.
Each side could be leased to the request handler independently.
Would the order still be indeterminate if those two sides were split?
I don't understand why we'd want to split these sides? Generally both types will refer to the same item: e.g. a single TcpStream
. So by "splitting" them all we're really doing is wrapping them in something like async-dup
(or in the future Arc
with the right traits on it) and passing the same item twice.
The reason why e.g. &File: Read + Write
is because at the OS-level nothing is stopping the system from opening a handle to the same file descriptor twice. Users of these APIs are expected to uphold invariants here, and ensure at a system level no races occur. So Rust treats this as a shared resource and implements Read / Write ops accordingly.
I don't understand why we'd want to split these sides?
There's "does it make sense" and "is it desirable" as two separate questions.
For why it makes sense: a TcpStream is a duplex channel, you could easily imagine one thread writing to a stream, whilst another thread is reading from it, without any need for the two to coordinate. This differs from eg. a file, where reads and writes to the same file must be carefully synchronized, as they modify shared state (the read/write position).
The reason why e.g. &File: Read + Write is because at the OS-level nothing is stopping the system from opening a handle to the same file descriptor twice.
Yeah, the OS primitives are effectively internally mutable. A bit like the Atomic* types, in that operations on them are generally atomic but that does not guarantee sane results.
However, it's generally considered bad practice in Rust to make everything be internally mutable: the opposite is true. We usually opt-into that, so eg. references to adaptors for AsyncRead / AsyncWrite will not generally implement those traits. While an OS file may be readable via shared reference, a BufReader
constructed around that will require a mutable reference.
By splitting the read/write sides, it avoids unnecessary synchonization: splitting the OS primitives is effectively free, because you can just clone them and they internally synchronize. Splitting the adapted type is unnecessary, because you can adapt the two sides of the OS primitive separately.
It also allows higher level APIs to exist that do prevent footguns like two threads concurrently writing to the same TcpStream, even if the lower-level OS primitives don't prevent that.
Finally, as it is used in async-h1, it seems like there could also be security impliciations to sharing the reader in this way, unless you implement some kind of "poisoning" mechanism. I could smuggle one request inside another:
POST /bleh
,...
<begin request body>
<bad data which causes request handler to fail or panic>
POST /internal/dangerous_endpoint
<smuggled request>
If each side of the TcpStream is uniquely owned, then this kind of situation can't happen.
It's also not clear how #151 could be solved without either removing buffering entirely, which would seriously degrade performance, or by requiring the caller to pass a BufRead
instead of a Read
, which would not implement Clone
.
I implemented the primitive I described in the issue: https://github.com/rust-lang/futures-rs/pull/2328/files
This could be used instead of Arc<T>
to allow multiple tasks access to the same stream, but restricted to a specific order.
AIUI, the clone bound is required in the case where multiple requests are accepted over the same AsyncRead and to avoid having a lifetime bound on the
Request
type.I think there a couple of problems with this:
I think a good workaround would be to use a
oneshot::channel
. Specifically, create anAsyncRead
type like:This would have a
finish()
method on that returns the inner reader to the sender. The advantage of this is that the sender loses access to the stream whilst is it on lease, the stream does not require any internal synchronization, and if there is a problem whilst reading from the stream (such as a panic or error) it will not be returned to the sender. The sender will just see the oneshot channel be canceled, and can handle that case cleanly.The downside to this is that it would require changing the signature of the
accept
function: I see two possibilities. Either the sender side can get its own wrapper that implementsAsyncRead
and hides the fact that the inner reader may be leased out. Alternatively, theaccept
function can return an additionalimpl Future<Reader>
(the oneshot::Receiver) for the caller to recover the reader.