rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.81k stars 12.66k forks source link

Tracking issue for `Write::write_all_vectored` #70436

Open Thomasdezeeuw opened 4 years ago

Thomasdezeeuw commented 4 years ago

This is a tracking issue for io::Write::write_all_vectored.

Feature gate: #![feature(write_all_vectored)].

Steps:

Unresolved questions:


Original issue:

In the `io::Write` trait we've got the helpful `write_all` method, which calls `write` in a loop to write all bytes. However there is no such function for `write_vectored`. I would suggest adding a function called `write_all_vectored` to performance such a task. A possible implementation. Note that `bufs` is a mutable slice, also see the discussion in https://github.com/rust-lang/futures-rs/pull/1741/files. On the playground: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=872e9d973bd8101e7724292f87a82869. ```rust pub trait Write { // ... fn write_all_vectored(&mut self, mut bufs: &mut [IoSlice<'_>]) -> io::Result<()> { while !bufs.is_empty() { match self.write_vectored(bufs) { Ok(0) => { return Err(Error::new( ErrorKind::WriteZero, "failed to write whole buffer", )); } Ok(n) => bufs = IoSlice::advance(mem::replace(&mut bufs, &mut []), n), Err(ref e) if e.kind() == ErrorKind::Interrupted => {} Err(e) => return Err(e), } } Ok(()) } } ``` Related: https://github.com/rust-lang/futures-rs/pull/1741 /cc @cramertj
cramertj commented 4 years ago

cc @sfackler, @withoutboats

sfackler commented 4 years ago

The main API question I have (which is more about IoSlice::advance than this I guess) is if we're okay saying that the content of the bufs slice is unspecified after this call returns. It's probably fine, but just one "weird" affect that you may not naively expect.

Thomasdezeeuw commented 4 years ago

@sfackler I agree, but I don't know how an alternative that takes &[IoSlice<'_>], like you would expect, would work.

sfackler commented 4 years ago

It could for example perform vectored writes when the current cursor position is at a slice boundry and switch to a normal write_all if it's in the middle of one of the slices.

Thomasdezeeuw commented 4 years ago

But then you're missing out on the efficiency of vectored writes, if that were the case I wouldn't consider using it.

sfackler commented 4 years ago

You are only missing out on the efficienty of vectored writes some of the time.

Thomasdezeeuw commented 4 years ago

True, but with the implementation I provided (which I still admit isn't great) you don't miss out. I've made the point already in https://github.com/rust-lang/futures-rs/pull/1741#discussion_r398715869, but personally I don't its really a big deal that the provided slice is modified. In most cases I imagine the bufs slice being build just before this call and only used in this call.

cramertj commented 4 years ago

One possibility would be to keep a fixed-size buffer of, say, [IoSlice<'_>; 128] and copy sections of the caller's slice into that, pairing it with an index for the last-most unwritten IoSlice . It's more complicated in that you have to re-load that IoSlice at some clever point in order to both minimize shifting but also maximize the number of IoSlices being offered to the underlying reader, but it prevents mutation of the original caller's slice while also allowing some fixed amount of vectored writes.

I'm not necessarily advocating for this idea, as I think it might be fine to mutate the caller's slice (they can always make a clone if they don't want to deal with that), but it's an option.

Thomasdezeeuw commented 4 years ago

@LukasKalbertodt I've changed the issue to be the tracking issue, could you apply the correct labels?

joshtriplett commented 4 years ago

I'm very happy to see this method added, thank you!

I regularly find myself wanting this method, when writing various different pieces of data into a buffer and not wanting to copy them around excessively.

joshtriplett commented 4 years ago

Regarding the API:

Normally, you can't split a writev call without creating a potential semantic difference. However, if writev returns early and hasn't written all the data, then the write already won't be atomic, so there's no requirement to keep the remaining iovecs together.

With that in mind, why don't we implement write_all_vectored to take a non-mutable &[IoSlice], and then if write_vectored does a partial write, we use write_all to write the rest of the current partial IoSlice if any, then resume calling write_vectored with the remaining subslice of the caller's &[IoSlice] values? That doesn't require any allocation or copying at all.

sfackler commented 4 years ago

@joshtriplett - yep, that approach was discussed up above. The thought for the current implementation is that the caller isn't going to care about the contents of the buffers slice after the call completes, and mutating it minimizes the number of write calls.

joshtriplett commented 4 years ago

If the calls are returning early, there may not be as much value in minimizing the number of calls.

Also, could we restore the slice afterwards, even if we mutate it as we go? For each call to write_vectored, at most one change to the slice needs to occur, so we could easily change it back after that call. That way, we're requiring exclusive access to the slice but giving it back.

Thomasdezeeuw commented 4 years ago

If the calls are returning early, there may not be as much value in minimizing the number of calls.

I don't quite understand this statement, the whole point of using vectored I/O is minimizing the number of system calls.

Also, could we restore the slice afterwards, even if we mutate it as we go? For each call to write_vectored, at most one change to the slice needs to occur, so we could easily change it back after that call. That way, we're requiring exclusive access to the slice but giving it back.

I guess we could but are there many cases in which you need to use the buffers after having written them all? I've personally never encountered that.

We could also create an IoSlices type that wraps &mut [IoSlice<'_>] and take ownership of that to make the ownership part of the API clearer.

joshtriplett commented 4 years ago

On Fri, Apr 17, 2020 at 01:12:55AM -0700, Thomas de Zeeuw wrote:

If the calls are returning early, there may not be as much value in minimizing the number of calls.

I don't quite understand this statement, the whole point of using vectored I/O is minimizing the number of system calls.

Under normal circumstances, I'd expect the kernel to process as much of a writev call as possible. What I'm suggesting is that if writev returns early, such that you have to re-call it in order to finish writing, then making an additional write call before doing so does not seem likely to cause a problem.

If someone is trying to optimize the number of syscalls there, they would want to start by figuring out why the kernel returned early from writev in the first place.

Also, could we restore the slice afterwards, even if we mutate it as we go? For each call to write_vectored, at most one change to the slice needs to occur, so we could easily change it back after that call. That way, we're requiring exclusive access to the slice but giving it back.

I guess we could but are there many cases in which you need to use the buffers after having written them all? I've personally never encountered that.

If you want to reuse the same buffers, it may also make sense to reuse the [IoSlice].

We could also create an IoSlices type that wraps &mut [IoSlice<'_>] and take ownership of that to make the ownership part of the API clearer.

That would improve on the current API. Or we could accept a Cow, and only make a copy if we need to mutate it.

Thomasdezeeuw commented 4 years ago

Under normal circumstances, I'd expect the kernel to process as much of a writev call as possible. What I'm suggesting is that if writev returns early, such that you have to re-call it in order to finish writing, then making an additional write call before doing so does not seem likely to cause a problem.

If someone is trying to optimize the number of syscalls there, they would want to start by figuring out why the kernel returned early from writev in the first place.

That can have many causes that change from OS to OS and even from OS version to version, I don't think its worth it to determine the cause for each. But we should still keep the goal of minimising the number of system calls, using write followed by more writev calls goes against this goal.

If you want to reuse the same buffers, it may also make sense to reuse the [IoSlice].

The underlying buffers can be reused, so all you'll need to do is IoSlice::new which is effectively a copy of the pointer and length. I don't think that is too much overhead.

That would improve on the current API. Or we could accept a Cow, and only make a copy if we need to mutate it.

Maybe, but what would the ToOwned impl for IoSlice be? For a slice its converted into a Vec, which allocates, something we don't want for this API.

joshtriplett commented 4 years ago

On Tue, Apr 21, 2020 at 02:22:57AM -0700, Thomas de Zeeuw wrote:

If you want to reuse the same buffers, it may also make sense to reuse the [IoSlice].

The underlying buffers can be reused, so all you'll need to do is IoSlice::new which is effectively a copy of the pointer and length. I don't think that is too much overhead.

Consider the case where you have a long [IoSlice] because you're stitching together a buffer from many disparate pieces.

That would improve on the current API. Or we could accept a Cow, and only make a copy if we need to mutate it.

Maybe, but what would the ToOwned impl for IoSlice be? For a slice its converted into a Vec, which allocates, something we don't want for this API.

Good point. Adding IoSlices to represent an owned [IoSlice] seems like a good step, then, if we decide to consume the IoSlice.

ezrosent commented 4 years ago

Hey y'all: Just noticed an issue with write_all_vectored as I was playing around with it. The documentation says:

If the buffer contains no data, this will never call write_vectored.

Looking at the implementation, that's not what happens. In particular, if bufs is nonempty, but only contains empty IoSlices, write_all_vectored not only will call write_vectored, but it will return an error even when the call succeeded.

 fn write_all_vectored(&mut self, mut bufs: &mut [IoSlice<'_>]) -> Result<()> {
        while !bufs.is_empty() {
            match self.write_vectored(bufs) {
                Ok(0) => {
                    return Err(Error::new(ErrorKind::WriteZero, "failed to write whole buffer"));
                }
//... 
Thomasdezeeuw commented 4 years ago

@ezrosent @tmiasko fixed it in #74088.

q2p commented 3 years ago

@Thomasdezeeuw is it possible to also implement a similar feature for reading? Like read_exact_vectored() which would try to fill &mut [IoMutSlice] until it encounters an EOF. And if there is not enough data, it would return UnexpectedEof.

Similar to this: ```rust pub trait Read { // ... fn read_exact_vectored(&mut self, mut bufs: &mut [IoSliceMut]) -> std::io::Result<()> { // Guarantee that bufs is empty if it contains no data, // to avoid calling write_vectored if there is no data to be written. bufs = IoSliceMut::advance(bufs, 0); while !bufs.is_empty() { match self.read_vectored(bufs) { Ok(0) => { return Err(std::io::Error::new(std::io::ErrorKind::UnexpectedEof, "failed to fill whole buffer")); } Ok(n) => bufs = IoSliceMut::advance(bufs, n), Err(ref e) if e.kind() == std::io::ErrorKind::Interrupted => {} Err(e) => return Err(e), } } Ok(()) } } ```
Thomasdezeeuw commented 3 years ago

@q2p I guess so, but let's not track that in this issue. You can create a pr for it.

joshtriplett commented 2 years ago

It seems like the biggest blocker for this is the open question about the API. The current API requires the caller to pass a &mut, so that write_all_vectored can modify it in place if it needs to make an additional writev call. This seems to have been proposed for performance reasons, to minimize the work required to make subsequent calls.

I would like to propose changing this, for a couple of reasons:

Meanwhile, requiring a writable slice of iovecs forces the caller to reconstruct their iovecs for every call, rather than reusing them for multiple calls (possibly with some modifications, but not rewriting them from scratch). This pessimizes the fast path in favor of optimizing the slower path.

I would propose changing this to an immutable reference, instead. If the initial write comes back partial, write_all_vectored can either copy the iovecs into a small stack buffer, or (in the uncommon case of a huge slice of iovecs) it can do a separate write syscall for the initial buffer.

This would both optimize the common case, and make all cases easier to invoke.

We discussed this in the @rust-lang/libs-api meeting today. Some others felt that this was a reasonable approach to simplify the API; others didn't feel strongly either way; nobody felt that we should keep the current &mut API.

adamreichold commented 2 years ago

In practice, based on many codebases I've seen, I expect the common case to have 2-3 iovecs, not hundreds or thousands. People primarily seem to use this for cases like writing from a ringbuffer and handling wraparound (using two iovecs), or writing a header and body.

I am not saying that it is a common pattern, but in a ecological simulation I am using a single call to write_all_vectored with up to 10^5 IoSlice items to write out the simulation state as a single copy into the page buffer, i.e. without assembling a separate serialized representation first. (Since the indirections I need to follow for this might have been re-allocated when the next state dump comes, I can not reuse the content of the Vec<IoSlice>.) (This is almost surely UB writing padding to disk and such, so probably another argument for not supporting such patterns.)

I would propose changing this to an immutable reference, instead. If the initial write comes back partial, write_all_vectored can either copy the iovecs into a small stack buffer, or (in the uncommon case of a huge slice of iovecs) it can do a separate write syscall for the initial buffer.

Agreeing that the above situation is not common, I still wonder if with the platform-specific knowledge embedded in libstd, that temporary buffer could be made sufficiently large to "saturate" the system call interface, e.g. contain 1024 IoSlice elements on Linux, so that the number of system calls would not increase in the slow path even though the small overhead of copying the IoSlices is taken. But I am not sure at which point the possibly repeated copying would be worse than doing an additional system call.

Thomasdezeeuw commented 2 years ago

It seems like the biggest blocker for this is the open question about the API. The current API requires the caller to pass a &mut, so that write_all_vectored can modify it in place if it needs to make an additional writev call. This seems to have been proposed for performance reasons, to minimize the work required to make subsequent calls.

Note that I'm the one who added that question and after working with for a while it's not that big a deal. It's a little different then what you might expect, e.g. compare to the write_vectored call, but it's very usable.

I would like to propose changing this, for a couple of reasons:

* In practice, based on many codebases I've seen, I expect the common case to have 2-3 iovecs, not hundreds or thousands. People primarily seem to use this for cases like writing from a ringbuffer and handling wraparound (using two iovecs), or writing a header and body.

I don't think that is true. If I remember correctly the limit is 1024, and I've been near it before so I don't think we should optimise of for the 2-3 iovecs use case too much. Of course all this is just anecdotal.

* In the slow path of having to make an additional syscall due to a partial initial write, the cost of a syscall will vastly eclipse the cost of copying a few iovecs.

Meanwhile, requiring a writable slice of iovecs forces the caller to reconstruct their iovecs for every call, rather than reusing them for multiple calls (possibly with some modifications, but not rewriting them from scratch). This pessimizes the fast path in favor of optimizing the slower path.

I would propose changing this to an immutable reference, instead. If the initial write comes back partial, write_all_vectored can either copy the iovecs into a small stack buffer, or (in the uncommon case of a huge slice of iovecs) it can do a separate write syscall for the initial buffer.

I have to say I disagree with this approach. Do you want to add a stack of 1024 (IOV_MAX) iovec to the function for copying the buffers?

I know the current API is a little odd at first, but when you assume that the buffer (slices) you passed in are not usable after the call it's an OK API to use. Futhermore as it's currently implemented it's zero overhead, while your suggestion of copying is not. If you're doing vectored I/O in the first place it's likely you care about this.

This would both optimize the common case, and make all cases easier to invoke.

We discussed this in the @rust-lang/libs-api meeting today. Some others felt that this was a reasonable approach to simplify the API; others didn't feel strongly either way; nobody felt that we should keep the current &mut API.

I didn't attend this meeting, nor am I part of the libs team, but I do feel strongly against this preposed case. May I ask how many of the people actually use the API? Because that might influence their vote. (hope I don't come off too strong with the last sentence, I'm just curious if people deciding on this are actually using the API)

joshtriplett commented 2 years ago

@Thomasdezeeuw I'm not suggesting that we optimize for the 2-3 case; I'm suggesting that we don't need to optimize heavily for an already-less-optimized subcase (partial writes) of the 1024 case, at the expense of both usability (not having to recreate buffers) and other optimization (not having to recreate buffers).

I have to say I disagree with this approach. Do you want to add a stack of 1024 (IOV_MAX) iovec to the function for copying the buffers?

No, I personally would add a local array of 8, and if larger than that, call write on any partial iovec before calling writev on the remainder.

Also, we have helpers that'd make it easy to write a mutate-in-place write_all_vectored using write_vectored.

as it's currently implemented it's zero overhead, while your suggestion of copying is not.

It's not zero-overhead if you count the cost of recreating a new set of iovecs for each call.

If you're doing vectored I/O in the first place it's likely you care about this.

In the optimal case, your writev call completes in a single syscall; in any other case, syscall overhead will by far exceed any other overhead. If you're doing vectored I/O for efficiency, you should try to ensure your operation completes in one writev call.

Thomasdezeeuw commented 2 years ago

@Thomasdezeeuw I'm not suggesting that we optimize for the 2-3 case; I'm suggesting that we don't need to optimize heavily for an already-less-optimized subcase (partial writes) of the 1024 case, at the expense of both usability (not having to recreate buffers) and other optimization (not having to recreate buffers).

I agree that the usability improvement is nice, but I've found I always have to recreate my buffers anyway (see below). If it's such a big problem maybe this needs a new type that wraps &mut [IoSlice] to show that ownership is passed to the function?

I have to say I disagree with this approach. Do you want to add a stack of 1024 (IOV_MAX) iovec to the function for copying the buffers?

No, I personally would add a local array of 8, and if larger than that, call write on any partial iovec before calling writev on the remainder.

That would make this function unusable for my case, which would be a real shame.

Also, we have helpers that'd make it easy to write a mutate-in-place write_all_vectored using write_vectored.

as it's currently implemented it's zero overhead, while your suggestion of copying is not.

It's not zero-overhead if you count the cost of recreating a new set of iovecs for each call.

But you'll almost always have to recreating the iovecs before the write call. Take your example of a ring buffer; how do you write into it while maintaining your iovecs? I don't think it's possible you can as you need both a mutable (for writing) and immutable (for the writev call) reference into the data, which Rust of course doesn't allow. So I don't really think this argument holds water.

Could you show an example where it is possible to keep your iovecs around while modifying the buffer?

If you're doing vectored I/O in the first place it's likely you care about this.

In the optimal case, your writev call completes in a single syscall; in any other case, syscall overhead will by far exceed any other overhead. If you're doing vectored I/O for efficiency, you should try to ensure your operation completes in one writev call.

I agree, but it's not possible for a program to ensure the operation completes in one writev call. The amount of bytes the OS can write in a single writev call differs per OS and even per target, i.e. different filesystem, socket, etc.

joshtriplett commented 2 years ago

@Thomasdezeeuw

If it's such a big problem maybe this needs a new type that wraps &mut [IoSlice] to show that ownership is passed to the function?

The issue isn't just that callers will forget this (&mut is a big hint that you can't count on it being unmodified), though that's also a problem. The primary issue seems like having to deal with it being overwritten by the function in the first place.

Could you show an example where it is possible to keep your iovecs around while modifying the buffer?

let mut iovecs = [
    IoSlice::new(header),
    IoSlice::new(x),
];
w.write_all_vectored(&iovecs)?;
iovecs[1] = IoSlice::new(y);
w.write_all_vectored(&iovecs)?;

Also, there are several functions we could add to make this better. For instance, we could add a function from &mut [IoSliceMut] to &[IoSlice], so that people can mutate a buffer through the IoSliceMut.

FWIW, I could also work with a write_all_vectored that requires a &mut but guarantees to change the slices back when done. "This uses your IoSlices as scratch space, but will always restore them to an unmodified state afterwards." But we can't make that guarantee about user impls of Write, only about our standard implementation; we could only do that if we made this a method that you can't implement yourself (e.g. a sealed WriteExt with a blanket impl). I don't think that's worth it.

Thomasdezeeuw commented 2 years ago

@Thomasdezeeuw

If it's such a big problem maybe this needs a new type that wraps &mut [IoSlice] to show that ownership is passed to the function?

The issue isn't just that callers will forget this (&mut is a big hint that you can't count on it being unmodified), though that's also a problem. The primary issue seems like having to deal with it being overwritten by the function in the first place.

To be fair that's why the documentation says

Once this function returns, the contents of bufs are unspecified, as this depends on how many calls to write_vectored were necessary.

If you program with that in mind, essentially passing ownership of bufs to the function (this why I think a type might help), I don't think its a problem. But I think I'm just rehashing points I've made before.

Could you show an example where it is possible to keep your iovecs around while modifying the buffer?

let mut iovecs = [
    IoSlice::new(header),
    IoSlice::new(x),
];
w.write_all_vectored(&iovecs)?;
iovecs[1] = IoSlice::new(y);
w.write_all_vectored(&iovecs)?;

What I meant what do you have an example where you actually modify the underlying buffers. If you have the headers prepared (especially if it's just a single buffer) I don't think the overhead of recreating it is too much.

Also, there are several functions we could add to make this better. For instance, we could add a function from &mut [IoSliceMut] to &[IoSlice], so that people can mutate a buffer through the IoSliceMut.

FWIW, I could also work with a write_all_vectored that requires a &mut but guarantees to change the slices back when done. "This uses your IoSlices as scratch space, but will always restore them to an unmodified state afterwards." But we can't make that guarantee about user impls of Write, only about our standard implementation; we could only do that if we made this a method that you can't implement yourself (e.g. a sealed WriteExt with a blanket impl). I don't think that's worth it.

That could, especially if the restoring can be optimised away if the compiler can see the buffers aren't used any more. 👍

SolraBizna commented 2 years ago

People primarily seem to use this for cases like writing from a ringbuffer and handling wraparound (using two iovecs), or writing a header and body.

For what it's worth, I was planning to use it for the case of assembling a text document from potentially thousands of pieces. For my use case, making a big Vec<IoSlice<_>> that I only use for a single call and then drop is exactly what I want.

Accepting a Vec directly, so that it is dropped afterwards, would fit my needs and would eliminate the "wait heck some of my IoSlices got altered?!" problem. It wouldn't be optimal for other cases, though, including the "adding a header" case.

You know, now that I think about it, I'm totally free to just implement that function myself in terms of write_vectored...

Thomasdezeeuw commented 2 years ago

How about an owned type that wraps T: AsRef<[IoSlice<'_>]> (or Deref) to indicate that the write call "owns" the buffers and can modify them, but allow the buffers wrapper type to be reused (i.e. the Vec in Vec<IoSlice<'_>>). It makes the API a bit more conflicted, but perhaps that will resolve @joshtriplett concerns.

joshtriplett commented 2 years ago

@Thomasdezeeuw wrote:

How about an owned type that wraps T: AsRef<[IoSlice<'_>]> (or Deref) to indicate that the write call "owns" the buffers and can modify them, but allow the buffers wrapper type to be reused (i.e. the Vec in Vec<IoSlice<'_>>). It makes the API a bit more conflicted, but perhaps that will resolve @joshtriplett concerns.

That doesn't address this concern I wrote above:

The issue isn't just that callers will forget this (&mut is a big hint that you can't count on it being unmodified), though that's also a problem. The primary issue seems like having to deal with it being overwritten by the function in the first place.

What if we used a Cow'd slice? If you never want your slice copied even in the partial write case, pass in a Cow::Owned, and give up your own ownership; if you don't want to give up ownership of your slice and allow mutating it, pass in a Cow::Borrowed.

Also, we could avoid an extra syscall even in the borrowed case, if we're willing to allocate in that case, and Cow will take care of that for us.

Thomasdezeeuw commented 2 years ago

What if we used a Cow'd slice? If you never want your slice copied even in the partial write case, pass in a Cow::Owned, and give up your own ownership; if you don't want to give up ownership of your slice and allow mutating it, pass in a Cow::Borrowed.

Also, we could avoid an extra syscall even in the borrowed case, if we're willing to allocate in that case, and Cow will take care of that for us.

@joshtriplett I'm sorry, but I don't really have the energy for this issue any more... You're comment completely misses the point of vectored I/O. If people are looking at using vectored I/O it's likely they're looking at it for performance reasons, suggesting to just allocate a new buffer so it can be used once in a system call is really quite far off this target.

It's been more than two years, the API works and has zero overhead as-is, but yes it has a gotcha. At this point either stabilise it or remove it, I'm no longer interested, this has just taken too long.

joshtriplett commented 2 years ago

So, I do care very much about vectored I/O as well, and I care about the use case of being able to reuse the iovecs repeatedly, which is also something people do in C for reasons of both performance and simplicity. I'd like to be able to do that somehow in Rust, and not have to throw away and recreate the array of iovecs every time. I'm trying to brainstorm some data structure that will allow for that reuse. I was trying to favor using existing constructs and types (at the expense of incurring overhead in what I saw as the less common case of partial write), but I acknowledge that that's not ideal. I agree that Cow wasn't a great idea.

As an alternative closer to what you already proposed, perhaps it makes more sense to make a new IoSlices structure that owns the iovecs as you suggested, but whose API guarantees that when the write method is done those iovecs are un-mutated? I would be happy to do that design work.

Rough sketch of a solution that shouldn't need to make any copies at all:

impl IoSlices {
    fn borrow_for_write(&self, offset: usize, impl FnOnce(&[IoSlice]) -> io::Result<usize>) -> io::Result<usize>;
}

If you have an &IoSlices (which write_all_vectored would get) you can call borrow_for_write which will mutate the slices using interior mutability, give you those slices so you can writev them or equivalent, and then modify them back (which hopefully we can optimize away if the IoSlices is going away). If you have a &mut IoSlices or an IoSlices, we can provide ways to get a &mut [IoSlice], but write_all_vectored would take an &IoSlices so it can't arbitrarily mutate the slices. (We can provide analogous types corresponding to IoSliceMut, and additionally a method that transmutes to give an &IoSlices so that you can use the same buffer for both read and write, and also modify through it when not currently reading or writing.)

That should avoid any copying, avoid any more syscalls than absolutely necessary, and overall make this exactly as efficient and capable as writev/readv can allow.

Does something like that sound reasonable?

Lucretiel commented 2 years ago

I wrote up a quick implementation of something similar to the proposed design. It's a pretty straightforward elaboration of advance_slices that takes care to preserve the original version of the current IoSlice and restores it each time the &[IoSlice] is advanced one step. Because only 1 buffer is ever in a mutated state at a time, it's easy to keep a copy of the original version and restore it when we're done with it. It also restores on drop to help keep io::Error returns clean.


struct BufferManager<'a> {
    buffers: &'a mut [io::IoSlice<'a>],
    saved: Option<io::IoSlice<'a>>,
}

impl BufferManager<'a> {
    pub fn new(buffers: &'a mut [io::IoSlice<'a>]) -> Self {
        // strip empty buffers
        let first_non_empty = buffers
            .iter()
            .position(|buf| buf.len() > 0)
            .unwrap_or(buffers.len());

        let buffers = &mut buffers[first_non_empty..];

        Self {
            buffers,
            saved: None,
        }
    }

    pub fn get(&self) -> &[io::IoSlice<'a>] {
        self.buffers
    }

    pub fn count(&self) -> usize {
        self.buffers.len()
    }

    pub fn advance(&mut self, amount: usize) {
        while let Some((head, tail)) = self.buffers.split_first_mut() {
            // The head is smaller than the overall write, so pop it off the
            // the front of the buffers. Be sure to restore the original state,
            // if necessary.
            if let Some(remaining) = amount.checked_sub(head.len()) {
                amount = remaining;

                if let Some(saved) = self.saved.take() {
                    *head = saved;
                }

                *buffers = tail;
            }
            // The head is larger than the overall write, so it needs to be
            // modified in place. Be sure to save the current state of the
            // buffer, if necessary.
            else {
                if amount > 0 {
                    self.saved = Some(self.saved.unwrap_or(*head));
                    *head = io::IoSlice::new(&head[amount..]);
                }

                return;
            }
        }

        assert!(amount == 0, "advanced too far")
    }
}

impl Drop for BufferManager<'_> {
    fn drop(&mut self) {
        // When the buffer manager is dropped, restore the state of the
        // current buffers head, if necessary. It shouldn't be possible for
        // there to be a saved value while the buffer list is empty.
        if let Some(head) = self.buffers.first_mut() {
            if let Some(saved) = self.saved {
                *head = saved
            }
        }
    }
}

Then, write_all_vectored is basically the same as it is now, but using BufferManager to handle slice advancing

fn write_all_vectored(
    &mut self,
    buffers: &mut [io::IoSlice<'_>],
) -> io::Result<()> {
    let mut buffers = BufferManager::new(buffers);

    while buffers.count() > 0 {
        match dest.write_vectored(buffers.get()) {
            Ok(0) => return Err(io::ErrorKind::WriteZero.into()),
            Ok(n) => buffers.advance(n),
            Err(err) if err.kind() == io::ErrorKind::Interrupted => {}
            Err(err) => return Err(err),
        }
    }

    Ok(())
}

The only downside here is that it does require the user to provide an &mut [IoSlice], but we can still promise at the API level (though not the type level) that the slices will be unmodified when this function returns.

dead-claudia commented 1 year ago

Can an extra method, or even option or something, be added to allow retrying the operation with data that wasn't successfully written? I'd like to tolerate some inner write errors like EAGAIN/ErrorKind::WouldBlock without having to reimplement write_all_vectored manually.

Thomasdezeeuw commented 1 year ago

Can an extra method, or even option or something, be added to allow retrying the operation with data that wasn't successfully written? I'd like to tolerate some inner write errors like EAGAIN/ErrorKind::WouldBlock without having to reimplement write_all_vectored manually.

We don't do this for the write_all function, so I don't think write_all_vectored should do anything different.

tbottomparallel commented 9 months ago

Would you consider an alternate signature that returns the total number of bytes written?

fn write_all_vectored(&mut self, bufs: &mut [IoSlice<'_>]) -> Result<usize>

I am aware that it is not consistent with the signature of write_all, but callers can implement the behavior trivially there:

dest.write_all(&buf)?;
Ok(buf.len)

Unless I am missing something crucial, callers of write_all_vectored cannot ergonomically or efficiently do the same. To implement the behavior they would need to iterate over bufs and call .len() on each IoSlice to sum up the total length. That's at least one extraneous pointer dereference per buffer.

In contrast, write_all_vectored can accumulate the total much more efficiently. The necessary information is already being returned by the underlying calls to writev and/or write. I would expect most of the time this will compile down to be one extra ADD <reg>,<reg> instruction per underlying write call.

In my experience (mostly embedded Linux) the returned count of bytes written is occasionally useful at runtime and extremely useful in unit tests. Since users cannot implement this efficiently without re-implementing write_all_vectored, and the underlying write calls are already returning the information, it seems reasonable to have write_all_vectored pass it along instead of discarding.

Example:

fn send_entercnd1(password: &str, dest: &mut impl std::io::Write) -> std::io::Result<usize> {
    Ok(dest.write_all_vectored(&[
        IoSlice::new(b"AT!ENTERCND=\""),
        IoSlice::new(password.as_bytes()),
        IoSlice::new(b"\"\r"),
    ])?)
}

vs

fn send_entercnd2(password: &str, dest: &mut impl std::io::Write) -> std::io::Result<usize> {
    // I know two of these have constant length and it could just return Ok(15+password.len())
    // In general this won't be the case. I didn't see the need to complicate the example.
    let mut bufs = &[
        IoSlice::new(b"AT!ENTERCND=\""),
        IoSlice::new(password.as_bytes()),
        IoSlice::new(b"\"\r"),
    ];
    // get length before `write_all_vectored`
    let length = {
        let mut acc = 0;
        for buf in bufs {
            acc += buf.len();
        }
        acc
    };
    dest.write_all_vectored(&mut bufs)?;
    Ok(length);
}
dead-claudia commented 9 months ago

Would you consider an alternate signature that returns the total number of bytes written?

This can trivially be done via bufs.iter().map(|s| s.len()).sum(). If your bufs here is the top-level array itself and there's only a few entries, the compiler will likely even be able to avoid the loop.

pliardb commented 5 months ago

Can an extra method, or even option or something, be added to allow retrying the operation with data that wasn't successfully written? I'd like to tolerate some inner write errors like EAGAIN/ErrorKind::WouldBlock without having to reimplement write_all_vectored manually.

We don't do this for the write_all function, so I don't think write_all_vectored should do anything different.

We got bitten in our codebase by a bug caused by us handling ErrorKind::WouldBlock by retrying the write_all_vectored() call with the same bufs input, despite the documentation advising against it to be fair.

Still it seems that this scenario could either be 1) more explicitly disallowed through type-checking by e.g. making bufs immutable or 2) supported by making bufs mutable "all the way" by making its type match IoSlice::advance_slices()'s bufs argument (&mut &mut [IoSlice]). We went with 2) in our codebase.