API: Buffer initialise level

ollie-etl commented 1 year ago

Originally in #215

This issue proposes a change in API, such that set_init tracks the current initialization level, rather than the historial watermark The motiviating example is buffer reuse

Invalid Code

let mut buf = Vec::with_capacity(2048);
loop {
    b = socket.recv(buf).await.1;
    let _ = target.write_at(b,0).await;
}

To reuse the buffer, you must instead incur the overhead of a slice, because initialization holds the historical fill level, not the current

let mut buf = Vec::with_capacity(2048);
loop {
    (r, b) = socket.recv(buf).await;
    let _ = target.write_all_at(b.slice(0..r.unwrap()),0).await;
}

I propose that calling set_init() with a value should not maintain the historical watermark, but instead set the curent level of initialized bytes. I have a number of reasons

Maintains Safety. It doesn't make the API any less safe to do this. It is still impossible to access uninitiatlised bytes if the level is lower than the historical max.
Improved safety. Should you be using the bytes as a proxy for some type T, knowing the exact size of T tracked in the buffer reduces footguns.
Speed. Buffer reuse should not incur Slice overhead

ollie-etl commented 1 year ago

The best stated argument for the status quo is from @mzabaluev , which i quote

I agree that this API is not intuitive, but this is what we have to account for two separate concerns:

Tracking the length of initialized bytes in the buffer. There is no reason to ever mark bytes as uninitialized after they've been initialized, and the only purpose of this water mark is to prevent safe access into uninitialized data.

The proposal does not infringe on the ability of the API to limit access to uninitialized data.

Tracking the length of the data filled by the last read operation.

The status quo fails to manage this without a runtime penalty, but wit the proposal, would also manage it.

FrankReh commented 1 year ago

This may point to the std group wanting both init and filled in the owned buffer.

ollie-etl commented 1 year ago

Its interesting that it goes down the concrete type with destructor rather than trait object route. Presumably on code generation efficiency grounds.

FrankReh commented 1 year ago

Its interesting that it goes down the concrete type with destructor rather than trait object route. Presumably on code generation efficiency grounds.

Yes. Would help for @Noah-Kennedy and @carllerche weigh in. Those two and Nick should be on the same page for the future otherwise I think the rest of us are spinning our wheels with very little purpose. I guess there's always a chance someone else discovers something one of them hadn't foreseen, but given how long Rust has been used for writing clients and servers, I would doubt it.

ollie-etl commented 1 year ago

We could shift entirely from the IoBuf and IoBufMut traits to owned-buf. ~It could also replace FixedBuf~. The OwnedSlice and OwnedCursor variants in that library would also replace our slicing infrastructure fully, and i think also the BoundedBuf trait. It'd actually be quite a significant simplification of our API

Can't replace Fixedbuf, because it doesn't hold enough information to identify the registry.

FrankReh commented 1 year ago

Can't replace Fixedbuf, because it doesn't hold enough information to identify the registry.

So could a FixedBuf and a sliced FixedBuf be used with non-fixed io operations like read and write? It can be now because FixedBuf implements IoBuf and IoBufMut. And all the io operations take a BoundedBuf{Mut} trait buffer.

Maybe io operations would still have to take BoundedBuf or BoundedBufMut and does that negate enough of the benefits that OwnedBuf{Mut} could bring us?

ollie-etl commented 1 year ago

I think OwnedBuffer is lacking actually. Its is not sufficiently general to recover some types of buffer. I think that could be improved with an additional member state: *mut void, to be used to hold state required to rebuild the original buffer type. that would permit an From instance from any type I believe, whilst allowing sufficient generality to recover.

(and yes, I know exactly why void pointers are generally frowned upon from a safety perspective)

The function pointer for the destructor and pointer type is starting to look a look like the dyn FixedBuffers in FixedBuffer. So maybe that is telling us something.

FrankReh commented 1 year ago

An OwnedBuf is already 40 bytes and an OwnedCursor is 56. At what point is the size of the wrapper going to get in the way of being nice to work with. They are already large enough where getting them to be cache aligned could make a difference on systems with tight caches. At least both those types would have room for one more 8 byte field.

ollie-etl commented 1 year ago

I think the semantics of OwnedBuf can be modified to be more general. This will introduce a type parameter, which may be undesirable?

trait IntoOwnedBuf {
    type State = ();
    type Restore = Self;
    fn into_owned_buf(self) -> OwnedBuf<Self>;
}

struct OwnedBuf<T: IntoOwnedBuf> {
   // Describe the buffer data 
   ptr: *mut u8,   
   len: usize,
   capacity: usize, 
   // Provision for returning ownership to the original buffer.
   fn: unsafe fn(&mut OwnedBuf<T>) -> <T as IntoOwnedBuf>::Restore,
   state: <T as IntoOwnedBuf>::State
}

impl OwnedBuf<T> {
   pub fn restore(self) -> <T as IntoOwnedBuf>::Restore {
        let fn = self.fn;
        fn(self)
    }
}

impl Drop for OwnedBuf<T> {
   pub fn drop(&mut self) {
        let fn = self.fn;
        let _ = fn(self);
    }
}

Some thoughts

provision is made to restore any type back to the original buffer, but it is not compulsory. By setting restore to (), it will just drop.
The type parameter is necessary, for recovering the original buffer type.
state could be as simple as a pointer to the original buffer, which we mem::forget whilst OwnedBuf exists.
whilst in theory you generate a lot of code for every generic variant, the actual representation of OwnedBuf is likely to often be two pointers for fn and state. I'd hope the compiler could rationalise that.

FrankReh commented 1 year ago

And I think we could still use input from @Noah-Kennedy and @nrc to know whether adding a filled parameter, one way or another to our traits or eventually concrete types is the correct direction. If it is the correct direction, we might as well bite the bullet soon, to clean up some APIs.

rrichardson commented 1 year ago

@FrankReh - It seems like OwnedBuf represents additional advancements from the discussions on Zulip. He also says that he hasn't "updated the traits yet" so I'm assuming that they will reflect the results of your conversations with him.

I think it's safe to assume that at least the basic filled functionality will be present. I'm not sure about the more advanced slicing bits though. I've gone ahead and explicitly asked the question to be sure.

rrichardson commented 1 year ago

@ollie-etl

This will introduce a type parameter, which may be undesirable?

It is undesirable, but that doesn't mean it's not worth it. Now is the time to have the conversation :)

At first I thought that it might be too excessive, but the IntoOwnedBuf has additional utility.

My biggest concern is threading. The destructor would need to have some mechanism for invoking the recycle operation in the originating thread, or we need to send the buffer back to the originating thread. Or any Recycling pool needs to be Send + Sync + 'static

oliverbunting commented 1 year ago

@rrichardson I'll admit I haven't given threading any thought. I had quite a blinkered view of application of a single threaded ring when I sketched this

FrankReh commented 1 year ago

@ollie-etl @rrichardson I've mentioned this to Noah too. I don't want the single threaded case to suffer much if the crate is trying to support multi threaded some day, but if it does, someone is going to want a feature to get the single-threaded performance back.

rrichardson commented 1 year ago

@ollie-etl @FrankReh - If we want to not disrupt the single-thread performance, and also avoid adding Send+Sync (+ 'static?) constraints to the IntoOwnedBuf trait, the dtor: fn(*const ()) approach might have to be it.

A person could do whatever gymnastics they needed with that *const (). It's not pretty, but I can't think of anything else that would work.

Outside of OwnedBuf, we are considering a Guard type for FixedBuf that would make it Send, and it'd do the right thing on drop

oliverbunting commented 1 year ago

You still need state. There are valid buffer types which require drop logic. (Mmap being one). Without state, I can't make an mmap an owned buffer, because I can't call unmap on drop

oliverbunting commented 1 year ago

You could lose the type parameter though, and just make restore completely unsafe, and restore to any type. Or have unsafeownedbuffer, which does that, and a generically typed ownedbuffer wrapping it, for those that want type safety

rrichardson commented 1 year ago

You still need state.

Absolutely. This should be stored and added as a param to the dtor.

You could lose the type parameter though, and just make restore completely unsafe.

I'm working on a PR for Bytes and am doing basically that. I don't think it's avoidable. The best you could do would be store the TypeId on construction, then validate it when converting back to the original. It's not worth it, though, IMO.

Or have unsafeownedbuffer, which does that, and a generically typed ownedbuffer wrapping it.

That's an interesting idea. It'd be a nice idea for the paranoid, but in most cases, the variety of buffers will be small, so accidentally converting to the wrong buffer seems low.

If you look at all of the frameworks in the Tokio ecosystem that take only Bytes I couldn't imagine them changing all of their APIs to be parameterized over the type of 3rd party buffer that Bytes is encapsulating.

ollie-etl commented 1 year ago

Closing as I am not inclined to pursue, and appear to be in a minority.

FrankReh commented 1 year ago

I think this a good discussion, worth keeping open, even if we don't know the next step yet.

tokio-rs / tokio-uring

API: Buffer initialise level #216

Invalid Code