Closed ollie-etl closed 1 year ago
@FrankReh @carllerche @Noah-Kennedy @mzabaluev . Many opinions wanted on this
You can get the number of bytes that are actually read from the return value of the recv
operation, and slice the data of the buffer with it. See the examples for how the APIs should be used.
You should not ignore the number of bytes returned by write_at
, either, because it can be less than the number of bytes in the input buffer/slice.
I agree that this API is not intuitive, but this is what we have to account for two separate concerns:
See ReadBuf
in tokio as an API that keeps track of both.
This is a good issue to track. I didn't think a PR that only proposed changing a few files in tokio-uring
does the question justice though.
When a buffer is going to be used many times, it has to be better for throughput if it were to be zeroed once and then reused with no init watermark using data space and code space.
I think it would make sense for fixed buffers at least. However, this is not what the IoBuf/IoBufMut
abstraction provides in general. And there may be applications that do not reuse buffers, but allocate them from the heap once per each I/O operation. I feel that this case should be fully supported with no degradation in performance (assuming the allocator is not a bottleneck there).
Also yes, there should be an issue to track this as an architectural change.
Tracking the length of initialized bytes in the buffer. There is no reason to ever mark bytes as uninitialized after they've been initialized, and the only purpose of this water mark is to prevent safe access into uninitialized data.
I cannot think of a single implementation where unsafety would be caused by reverting the watermark to the current read level. This is because setting init lower than the true value cannot cause access to initialized bytes.
And there may be applications that do not reuse buffers, but allocate them from the heap once per each I/O operation. I feel that this case should be fully supported with no degradation in performance (assuming the allocator is not a bottleneck there).
The change in API proposed would make no difference the single use case
Moved to issue
I feel that this case should be fully supported with no degradation in performance (assuming the allocator is not a bottleneck there).
Fair enough. I know it's just my opinion. (I use that term a lot for this repo's issues.)
I cannot think of a single implementation where unsafety would be caused by reverting the watermark to the current read level.
Not brought up before but another use for reading into buffers is reading into buffer slices. Like reading into a Vec slice doesn't change the Vec's length, our reads don't need to change a property of the buffer itself. There is a return argument for the length. True other io calls can just return the length and don't have to return the buffer.
The original discussion came up here https://github.com/tokio-rs/tokio-uring/pull/213#discussion_r1084501623 I'll rephase my arguments here:
Having
bytes_init()
only ever go up is broken. Consider the following example for copying data from a socket into a fileThis example is broken, because the
set_init
function is a one way ratchet, which can never be unset. It is called inrecv
, and all other read-like apis which fill buffers, to set the number of bytes read. However, set_init, in the current implmentation, ignores this value if its less than the previous number of initialised bytes. Unfortunately, large parts of write-like functions in the current API (and me) assume thatbytes_init
is equivalent to valid bytes of interest. As it standsIoBuf
has no concept of length, only of bytes which have historically been valid (historical max len).It is never possible to call a write-like function with a reused buffer. The IoBuf does not carry information on how many bytes to write.
This PR gets rid of the lifetime initialization counter, and replaces it with a the meaning of currently initialized bytes. I.e, after a read, the number of initialized bytes is the number read.