Closed seanmonstar closed 1 year ago
cc @Noah-Kennedy @erickt @dhobsd
I am a fan of this if a proper trait could be found and the tokio experience has short-cuts such that the difference in experience with this change is minimal. But since that will go in util I don't see why hyper core it self shouldn't be based on its own io types.
I think you want to go 1.0 with Hyper long before we add such traits to std, but I think it would be great if using the std ones was in the longer-term plan. From my PoV, I'd love to know if there is anything in the current design proposal which wouldn't work for Hyper in the long run.
Text: https://github.com/nrc/portable-interoperable/tree/master/io-traits#readme WIP code: https://github.com/nrc/async-io-traits
also really like this idea, at least until the traits are available in std, and the possibility to take them in directions that make sense for hyper while not necessarily desirable for tokio and to experiment with completion io. it also feels more in line with the design of 1.0 (from an onlooker perspective) when the main reason to keep using the tokio traits is easier integration given the other integration helpers have all (?) been moved to util
The main thing I'm personally interested in here would be a mechanism to allow zero copy IO for APIs which require ownership of buffers. For IO uring, ideally implementations of the IO traits would be able to provide their own buffer types, which is essential for things like provided buffers (a facility for kernel-managed buffer pools).
If this was going to be implemented how would it work in relation to the crates Hyper depends on? For example, h2 also uses tokio::io::{AsyncRead, AsyncWrite}
so it would also need to be updated to use these custom IO traits.
Given these facts, I assume it would be necessary for a hyper-io
crate to the introduced as h2
couldn't import the traits from hyper
without causing a recursive crate dependency which I am fairly sure is not supported.
I came across this problem while trying to implement this PR, I would be curious about the planned solution and if so I might be able to continue my implementation.
Given these facts, I assume it would be necessary for a hyper-io crate to the introduced as h2 couldn't import the traits from hyper without causing a recursive crate dependency which I am fairly sure is not supported.
Oh, that's an interesting catch! I suppose it's a possibility we could define another crate, though I'm not so sure it's worth the hassle, yet. Another option is that h2
could define some IO traits too, based on the operations it wants to do, and then inside hyper we just have an adapter.
I think that we perhaps don't have to decide that just yet, and could use an internal adapter anyways inside hyper, wrapping the IO type inside something implements Tokio's traits when we pass it to h2
. Changing to either option later on should be backwards compatible.
I'm taking this one now. One not-too-important detail I wondered at immediately, figured I'd ask for other opinions: should there be a separate hyper::io
module just for this? Everything else related to "runtime stuff" is in hyper::rt
. So, these could be hyper::rt::{Read, Write}
, or in a separate io
module.
hyper::io
feels better (and I was never a fan of rt
name)
There are two considerations I have here to add.
The first one is that it's useful if hyper can avoid having to allocate memory for each in-progress TCP read that occurs. In io_uring (which we will get to in a minute) this is achieved via provided buffers. In epoll, this is achieved via buffer pools, where you wait for readiness, grab a buffer, attempt a read, and put it back if the read fails due to lack of readiness, and then you put the buffer back when you are done with it.
The second is that I'd like an io_uring compatible hyper at some point.
I think both are solvable if we get a bit more high-level and think about what hyper is trying to do. Do we need something that looks like tokio's IO traits, or can we make do with something more specialized to what we need?
What I was thinking about was something resembling the below code for reads:
pub trait HyperAsyncRead {
type Buffer: AsMut<[u8]>;
fn poll_recv(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<io::Result<Self::Buffer>>;
}
pub trait HyperAsyncReadExt {
fn read(&mut self) -> Read<Self> {
Read {
io: self,
_pin: PhantomPinned::default(),
}
}
}
#[pin_project]
pub struct Read<'a, I: ?Sized> {
io: &'a mut I,
#[pin]
_pin: PhantomPinned,
}
impl<'a, I> Future for Read<'a, I>
where
I: HyperAsyncRead + Unpin + ?Sized,
{
type Output = io::Result<I::Buffer>;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
let projected = self.project();
Pin::new(&mut **projected.io).poll_recv(cx)
}
}
impl<I: HyperAsyncRead> HyperAsyncReadExt for I {}
I'm still trying to think about writes.
I have far less experience with low-level concepts like these than some folks here but I'd like to point out some implications of the above HyperAsyncRead
In order to use your own buffer pool, you need to implement an IO type yourself (this is probably good)
struct MyOwnBufferPoolTcpStream {
inner: tokio::net::TcpStream
}
impl HyperAsyncRead for MyOwnbufferPoolTcpStream {
type Buffer = MyOwnBufferType;
...
}
As far as alternate suggestions, I don't have any. I thought briefly about a HyperAsyncRead<T: AsMut<[u8]>>
but that's no different than just accepting a &mut [u8]
as a concrete argument. I also thought about HyperAsyncRead<T: SomeBufferTrait>
but that seems needlessly complicated. Few people will be implementing these buffering newtypes and those who are probably want to fully own the workings of their buffers.
All this said - owned buffer writes are more complicated. Hyper will likely need to acquire buffers, write into them, and then pass them to the HyperAsyncWrite
trait as an input. For this, it might be important to define a Buffer
trait. Abstracting over buffers that can and cannot be resized will also be difficult. Treating all buffers as fixed-size would be easy, but might make some people upset.
Here's a brief sketch of an idea..
pub trait HyperBuffer: Sized {
fn poll_acquire(cx: &mut Context<'_>) -> Poll<io::Result<Self>>;
fn put_bytes(&mut self, bytes: &[u8]) -> std::io::Result<()>;
}
pub trait HyperAsyncWriter {
type Buffer;
fn poll_write(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Result<(), (io::Error, Option<Self::Buffer>)>>;
}
pub trait HyperAsyncWrite {
type Buffer: HyperBuffer;
type Writer<'a>: HyperAsyncWriter
where
Self: 'a;
fn start_write<'a>(self: Pin<&'a mut Self>, buffer: Self::Buffer) -> Self::Writer<'a>;
fn poll_flush(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<std::io::Result<()>>;
}
It's important that the Buffer
type is once again tied to the IO type. This gives IO implementers access to the internals of the buffer type, and makes the HyperBuffer
API describe how hyper uses the buffer
This API forces the HyperAsyncWrite
type to take ownership of the Buffer for the duration of the write. For tokio::net::TcpStream
with type Buffer = Vec<u8>;
this means building the IO type like this:
struct VecBuffer {
vec: Vec<u8>,
}
struct TcpStream {
io: tokio::net::TcpStream,
}
struct TcpStreamVecWriter<'a> {
written: usize,
buffer: Option<Vec<u8>>,
io: Pin<&'a mut TcpStream>,
}
impl HyperBuffer for VecBuffer {
fn poll_acquire(_: &mut Context<'_>) -> Poll<io::Result<Self>> {
Poll::Ready(Ok(VecBuffer {
vec: Vec::with_capacity(4096),
}))
}
fn put_bytes(&mut self, bytes: &[u8]) -> std::io::Result<()> {
self.vec.extend_from_slice(bytes);
Ok(())
}
}
impl<'a> HyperAsyncWriter for TcpStreamVecWriter<'a> {
type Buffer = VecBuffer;
fn poll_write(
self: Pin<&mut Self>,
cx: &mut Context<'_>,
) -> Poll<Result<(), (io::Error, Option<Self::Buffer>)>> {
let this = self.get_mut();
let to_write = this.buffer.as_ref().expect("Polled after completion");
match ready!(Pin::new(&mut this.io.io).poll_write(cx, &to_write[this.written..])) {
Ok(count) => {
this.written += count;
if this.written == to_write.len() {
this.buffer.take();
Poll::Ready(Ok(()))
} else {
Pin::new(this).poll_write(cx)
}
}
Err(e) => Poll::Ready(Err((e, this.buffer.take().map(|vec| VecBuffer { vec })))),
}
}
}
impl HyperAsyncWrite for TcpStream {
type Buffer = VecBuffer;
type Writer<'a> = TcpStreamVecWriter<'a>;
fn start_write<'a>(self: Pin<&'a mut Self>, buffer: Self::Buffer) -> Self::Writer<'a> {
TcpStreamVecWriter {
written: 0,
buffer: Some(buffer.vec),
io: self,
}
}
fn poll_flush(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<std::io::Result<()>> {
Pin::new(&mut self.get_mut().io).poll_flush(cx)
}
}
While I think this "would work". There's a few things I'm not happy with or unsure about. Don't like:
Unsure about:
self
param)self: Pin<&mut Self>
for every method herepoll_write
call)Perhaps an alternative to HyperBuffer
could be a deliberate BufferPool
trait that produces buffers from self
Thanks for considering my rambling
Edit: Here's an alternative write trait that does away with the Writer trait:
pub trait HyperAsyncWrite {
type Buffer: HyperBuffer;
type Writable;
fn start_write(self: Pin<&mut Self>, buffer: Self::Buffer) -> Self::Writable;
fn poll_write(
self: Pin<&mut Self>,
writable: Pin<&mut Self::Writable>,
cx: &mut Context<'_>,
) -> Poll<Result<(), (std::io::Error, Option<Self::Buffer>)>>;
fn poll_flush(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<std::io::Result<()>>;
}
This enables concurrent reads and writes, since each call to poll_write
only borrows the IO type for the call. Again, not sure what types should and should not be Pinned here. I have a hunch that Writable
doesn't need Pin
I've opened #3230 which implements this. There are some remaining decisions to be made as part of the implementation, listed in the PR description.
Hey guys! I cannot understand why not just switch to the completion io style. I believe expressing readiness is much easier based on top of it, rather than vice versa.
Something like this:
trait Read {
async fn read<Buf: IoBuf>(&mut self, buf: Buf) -> (io::Result<usize>, Buf);
}
impl Read for tokio::net::TcpStream {
async fn read<Buf: IoBuf>(&mut self, mut buf: Buf) -> (io::Result<usize>, Buf) {
let result = tokio::io::AsyncReadExt::read(self, buf.borrow_it_somehow_as_mut_slice()).await;
(result, buf)
}
}
We will lose the ability to refer to stack-allocated buffers, but is it really a big deal? I believe that if you want to add support for io_uring in the future, you will need to limit the usage of these traits internally to the completion style anyway
You mentioned that you want to support io-uring eventually, so anyway you will need to provide the guarantees it require: pointer to the data must outlive the operation in kernel.
So that, even if allocating stack buffers is performance critical, you can unsafely implement IoBuf
for *mut u8
pointer to stack memory, which you can then pass to the io traits api. You guarantee that you do not deallocate memory until operation is completed, and async function stack is pinned somewhere, so the pointer stays valid.
It is unsound, because io trait impl may make IoBuf
outlive the call, but you can make the trait unsafe
and ask implementors not to do so
Hey guys! I cannot understand why not just switch to the completion io style. I believe expressing readiness is much easier based on top of it, rather than vice versa.
Something like this:
trait Read { async fn read<Buf: IoBuf>(&mut self, buf: Buf) -> (io::Result<usize>, Buf); } impl Read for tokio::net::TcpStream { async fn read<Buf: IoBuf>(&mut self, mut buf: Buf) -> (io::Result<usize>, Buf) { let result = tokio::io::AsyncReadExt::read(self, buf.borrow_it_somehow_as_mut_slice()).await; (result, buf) } }
We will lose the ability to refer to stack-allocated buffers, but is it really a big deal? I believe that if you want to add support for io_uring in the future, you will need to limit the usage of these traits internally to the completion style anyway
The issue here is predominantly memory usage. This approach requires you to waste a lot of memory, enough so that it can cause issues for some folks deploying hyper at scale. I've been trying to craft a solution that does not have these issues. We need a completion-based API, but not a naive one. This applies to uring as well. We need provided buffers to be supported and used.
This approach requires you to waste a lot of memory
How exactly? I see that it doesn't allow to pass references to stack buffers in a safe way, but I don't see "predominantly memory usage". Does hyper depend on stack buffers so much?
How exactly? I see that it doesn't allow to pass references to stack buffers in a safe way, but I don't see "predominantly memory usage". Does hyper depend on stack buffers so much?
In the trait you provided, users of hyper have to allocate a buffer for every connection to poll, whether in read-ready mode they only need one or equal to number of threads.
Io-uring's solution to this is provided buffer, by giving uring a list of buffers and let it allocate and deallocate.
It's also worth noting a similarity in how hyper can pick between poll_write
and poll_write_vectored
depending on if the IO says it can use it. With hyper owning these traits, when we determine how to expose completion_write
, hyper can chose a different branch and handle the new strategy.
In the trait you provided, users of hyper have to allocate a buffer for every connection to poll, whether in read-ready mode they only need one or equal to number of threads.
Then probably user managed buffer pools can address this issue?
Then probably user managed buffer pools can address this issue?
Sorry I was confused about "user managed buffer pools", IMHO having hyper managed buffer pools would definitely solve this.
Or you can expose a interface similar to curl, where polling, reading and writing are all done by users.
I am not familiar with the curl interface. Is it in SANS I/O style?
@seanmonstar do you think it is theoretically possible to make hyper SANS I/O state machine?
like https://crates.io/crates/quinn-proto or https://crates.io/crates/str0m
it also will solve timers, and non-Send executor issues
I am not familiar with the curl interface. Is it in SANS I/O style?
Sorry, this is actually what this issue proposes. And yeah, I think having a user managed buffer would address the mem usage issue.
FWIW, I suppose it might be a sizable change at this point, but I do think sans-I/O designs are generally a good thing, and I've used them in several crates (other than quinn-proto) to good effect. That said, even if hyper came with a sans-I/O design it would likely also want to provide some I/O mechanisms (similar to the quinn-proto/quinn interface).
@NobodyXu
this is actually what this issue proposes.
I meant more like a state machine approach
https://gist.github.com/VictorBulba/aad7632544e1fbd9c4c8f2cc9b558b6c
I meant more like a state machine approach
Having a state machine approach is great, though I still think it would have to use async
and have its own IO traits at some point.
Otherwise, you would basically have to implement what async fn
does for you by hand.
Well with this, hyper can stay the same. Keep the interface as it is now. But abstract the internal implementation of it into the separate state machine crate, so that guys from monoio for example, could create more tight integration with their runtime and benefit from it.
you would basically have to implement what async fn does for you by hand
I was inspired by https://github.com/algesten/str0m sans-io webrtc library, which I use in my project with my custom compute runtime and a single thread of tokio_uring. There is no need to write you own futures, use async functions from your runtime
Yeah it's certainly possible to do, though I'm not sure how difficult it is to do it with http. Perhaps @seanmonstar can answer your question since they are the maintainer of this crate and the related crates?
I am just wondering how powerful it would be if combined with arena allocators (like bumpalo) for state machine methods
Given that provided buffers where mentioned so much, I think it is relevant to say that it has been superseded by ring mapped buffers in 5.19. There is little to no documentation about it (this commit and this file are the only ones as far as I know). Now, the naming seems to be a little ambiguous, it seems that provided buffers is sometimes used to refer to both ring mapped buffers and the legacy provided buffers, but I thought it was relevant.
Given that provided buffers where mentioned so much, I think it is relevant to say that it has been superseded by ring mapped buffers in 5.19. There is little to no documentation about it (this commit and this file are the only ones as far as I know). Now, the naming seems to be a little ambiguous, it seems that provided buffers is sometimes used to refer to both ring mapped buffers and the legacy provided buffers, but I thought it was relevant.
My apologies, I used the term provided buffers to refer to both collectively when I spoke.
Was “Prove that it is forwards-compatible with completion-based IO” resolved?
How is that forward compatible? Do you have any examples?
Just want to call out here that this was merged but none of the examples on https://hyper.rs were updated, leading to the following error with the examples:
error[E0277]: the trait bound `tokio::net::TcpStream: hyper::rt::Read` is not satisfied
--> src/server.rs:29:24
|
29 | .serve_connection(stream, service)
| ---------------- ^^^^^^ the trait `hyper::rt::Read` is not implemented for `tokio::net::TcpStream`
| |
| required by a bound introduced by this call
|
Hey, congrats on 1.0! Sorry if I should open a new issue instead of posting here, but I was wondering if there was a reason {Read,Write} for Pin<P>
implementations aren't provided? Without it, something like Pin<Box<T: Read>>
doesn't actually impl Read
. It seems like a pretty obvious blanket impl to have (e.g. Future has impl<P> Future for Pin<P> where P: DerefMut<Target: Future>
, tokio has impl<P> Async{Read,Write} for Pin<P> where P: DerefMut<Target: Async{Read,Write}> + Unpin
), and because Pin is #[fundamental]
it would actually be a breaking change to add one in the future.
@coolreader18 let's make it a separate issue. but in short, we could provide it, and if done quickly, I think it's a fair non-breaking-breaking-change. Like, people probably didn't get that far, and little fixes like that right after launch happen.
This was actually brought up as an unresolved question in the ROADMAP. I think I mentally dismissed it at some point as just being annoying, but in the past week as I've thought about how we could make use of io-uring in hyper, I've been thinking about this again. If nothing else, this should be a public record of the decision, whether for or against. This will help others, including Future Us.
All other integration with a runtime (Tokio) has been removed, and helpers exist in
hyper-util
. hyper 1.0 (as of of rc.2) still depends ontokio
with all features turned off, to ask that IO transports implementtokio::io::{AsyncRead, AsyncWrite}
. By doing so, it makes it easier for users to simply supply atokio::net::TcpStream
. But, let me at least bring up some downsides.Reasons for
hyper::io::{AsyncRead, AsyncWrite}
tokio
just to implement the IO traits hyper wants. People have expressed concern that they don't want to compile multiple runtimes. While we can explain it's just the traits (when the features are turned off), owning the traits could make people less confused.We can provide
hyper_util::io::Tokio(T)
that implements the traits for Tokio types, to reduce friction.