Suggestion to remove Clone bound from reader

Diggsey commented 3 years ago

AIUI, the clone bound is required in the case where multiple requests are accepted over the same AsyncRead and to avoid having a lifetime bound on the Request type.

I think there a couple of problems with this:

Many streams are not cloneable, which can make integrations difficult, eg. #109.
Even for a TcpStream, it doesn't really make sense to clone it, as concurrent acess will just corrupt the state.
There's nothing that inherently prevents accidental concurrent access: the crate just relies on the caller doing the right thing.

I think a good workaround would be to use a oneshot::channel. Specifically, create an AsyncRead type like:

struct LeasedReader<T> {
    return_to_sender: oneshot::Sender<T>,
    inner: Option<T>,
}

This would have a finish() method on that returns the inner reader to the sender. The advantage of this is that the sender loses access to the stream whilst is it on lease, the stream does not require any internal synchronization, and if there is a problem whilst reading from the stream (such as a panic or error) it will not be returned to the sender. The sender will just see the oneshot channel be canceled, and can handle that case cleanly.

The downside to this is that it would require changing the signature of the accept function: I see two possibilities. Either the sender side can get its own wrapper that implements AsyncRead and hides the fact that the inner reader may be leased out. Alternatively, the accept function can return an additional impl Future<Reader> (the oneshot::Receiver) for the caller to recover the reader.

jbr commented 3 years ago

I agree that this is a problem. The design of tide and async-h1 requires having multiple clones of the AsyncRead+AsyncWrite around, because both the request and response need access to it, in an indeterminate order

yoshuawuyts commented 3 years ago

We tried implementing async-h1 without be Clone bound because we indeed never concurrently access the underlying stream — but this requires GATs to express in the language. Without that it's mostly trading off workarounds.

One thing I'm working on right now is to make it so std::sync::Arc<T> implements Read or Write if &T: Read or &T: Write. This can already be done through workarounds, but having it directly from std would make this easier to do — and would make the workaround for GATs we're relying on in async-h1 easier to use.

Diggsey commented 3 years ago

The design of tide and async-h1 requires having multiple clones of the AsyncRead+AsyncWrite around, because both the request and response need access to it, in an indeterminate order

Well... concurrent access to the reader and writer sides makes sense. Would the order still be indeterminate if those two sides were split? ie. I would expect the reader side to have a determined order (read the request header, read the request body, repeat) and the writer side to have a determined order.

Each side could be leased to the request handler independently.

yoshuawuyts commented 3 years ago

Would the order still be indeterminate if those two sides were split?

I don't understand why we'd want to split these sides? Generally both types will refer to the same item: e.g. a single TcpStream. So by "splitting" them all we're really doing is wrapping them in something like async-dup (or in the future Arc with the right traits on it) and passing the same item twice.

The reason why e.g. &File: Read + Write is because at the OS-level nothing is stopping the system from opening a handle to the same file descriptor twice. Users of these APIs are expected to uphold invariants here, and ensure at a system level no races occur. So Rust treats this as a shared resource and implements Read / Write ops accordingly.

Diggsey commented 3 years ago

I don't understand why we'd want to split these sides?

There's "does it make sense" and "is it desirable" as two separate questions.

For why it makes sense: a TcpStream is a duplex channel, you could easily imagine one thread writing to a stream, whilst another thread is reading from it, without any need for the two to coordinate. This differs from eg. a file, where reads and writes to the same file must be carefully synchronized, as they modify shared state (the read/write position).

The reason why e.g. &File: Read + Write is because at the OS-level nothing is stopping the system from opening a handle to the same file descriptor twice.

Yeah, the OS primitives are effectively internally mutable. A bit like the Atomic* types, in that operations on them are generally atomic but that does not guarantee sane results.

However, it's generally considered bad practice in Rust to make everything be internally mutable: the opposite is true. We usually opt-into that, so eg. references to adaptors for AsyncRead / AsyncWrite will not generally implement those traits. While an OS file may be readable via shared reference, a BufReader constructed around that will require a mutable reference.

By splitting the read/write sides, it avoids unnecessary synchonization: splitting the OS primitives is effectively free, because you can just clone them and they internally synchronize. Splitting the adapted type is unnecessary, because you can adapt the two sides of the OS primitive separately.

It also allows higher level APIs to exist that do prevent footguns like two threads concurrently writing to the same TcpStream, even if the lower-level OS primitives don't prevent that.

Finally, as it is used in async-h1, it seems like there could also be security impliciations to sharing the reader in this way, unless you implement some kind of "poisoning" mechanism. I could smuggle one request inside another:

POST /bleh
,...
<begin request body>
<bad data which causes request handler to fail or panic>
POST /internal/dangerous_endpoint
<smuggled request>

If each side of the TcpStream is uniquely owned, then this kind of situation can't happen.

Diggsey commented 3 years ago

It's also not clear how #151 could be solved without either removing buffering entirely, which would seriously degrade performance, or by requiring the caller to pass a BufRead instead of a Read, which would not implement Clone.

Diggsey commented 3 years ago

I implemented the primitive I described in the issue: https://github.com/rust-lang/futures-rs/pull/2328/files

This could be used instead of Arc<T> to allow multiple tasks access to the same stream, but restricted to a specific order.

http-rs / async-h1

Suggestion to remove Clone bound from reader #175