google / zerocopy

https://discord.gg/MAvWH2R6zk
Apache License 2.0
1.56k stars 99 forks source link

Features to support musli-zerocopy #523

Open joshlf opened 1 year ago

joshlf commented 1 year ago

See https://udoprog.github.io/rust/2023-10-19/musli-zerocopy.html and https://docs.rs/musli-zerocopy/latest/musli_zerocopy/.

cc @udoprog

joshlf commented 1 year ago

@udoprog re: bytes to type conversion: You can usually accomplish this by constructing a Ref and then using into_ref, into_mut, into_slice, or into_mut_slice. The ergonomics are definitely bad, and we need to improve them.

udoprog commented 1 year ago

Hi!

I'll go through some of the major areas I've yet to find a solution for:

  1. Types that can be coerced through validation. There is no trait to coerce NonZero* or char directly and FromBytes extends FromZeros effectively preventing it and all structures which depend on it. Ref does, so it's also excluded.

  2. General APIs for working with padding in any buffer. If I had my pick I think you should adopt something like my API wholesale. These are the relevant parts:

    
    pub struct Padder<'a, T: ?Sized> {
    ptr: *mut u8,
    offset: usize,
    _marker: PhantomData<&'a mut T>,
    }

impl<'a, T: ?Sized> Padder<'a, T> { pub(crate) fn new(ptr: *mut u8) -> Self { Self { ptr, offset: 0, _marker: PhantomData, } }

/* methods that derived structures uses to indicate which fields and discriminators are present */

}

trait UnsizedPaddable { unsafe fn pad(&self, pad: &mut Padder<'_, Self>); }

trait Paddable { unsafe fn pad(&self, pad: &mut Padder<'_, Self>); }


That way, anything that can provide a mutable pointer can have padding bytes written to it, like a `Vec<u8>`. It also follows that you ought to expose somehow whether a type `T` requires padding (I use `T::PADDED` and it optimizes really well). You might opt to make it generic through some trait, all though I personally wouldn't need that and can't think of a case where it is needed. There is quite a bit of subtlety (which is echoed in the actual `Padded` api). E.g. to determine how the instance of an enum is padded you have to load the discriminant, and since the enum might be part of a packed type the [load has to be unaligned](https://github.com/udoprog/musli/blob/main/crates/musli-zerocopy/src/buf/padder.rs#L105C16-L105C16).

To use such an API, the caller is resonsible for ensuring that a buffer with capacity `size_of::<T>()` is available somewhere where the bytes of `T` has been copied to so that the padding can be filled in by the abstraction. This also makes it `unsafe`.

3. APIs for [working with unsized types](https://docs.rs/musli-zerocopy/latest/musli_zerocopy/traits/trait.UnsizedZeroCopy.html). Unfortunately everything needed to support it well is unstable (`ptr_metadata`, `align_of_val_raw`), so this is super messy and has to be specialized over the types that have metadata constructors (`str` and `[T]`). I'm not sure whether that's something a generic library wants to incorporate right now, but you can get a feel for how I did it with my own [`Pointee` trait mirroring that in `core`](https://doc.rust-lang.org/std/ptr/trait.Pointee.html).

Again, please advise if there's something about `zerocopy` I've missed. And if there's question or push back feel free to push. I'll be happy to detail more things I can think of in the future.
joshlf commented 1 year ago

Hi!

I'll go through some of the major areas I've yet to find a solution for:

  1. Types that can be coerced through validation. There is no trait to coerce NonZero* or char directly and FromBytes extends FromZeros effectively preventing it and all structures which depend on it. Ref does, so it's also excluded.

Does https://github.com/google/zerocopy/issues/5 address this use case?

  1. General APIs for working with padding in any buffer. If I had my pick I think you should adopt something like my API wholesale. These are the relevant parts:
pub struct Padder<'a, T: ?Sized> {
    ptr: *mut u8,
    offset: usize,
    _marker: PhantomData<&'a mut T>,
}

impl<'a, T: ?Sized> Padder<'a, T> {
    pub(crate) fn new(ptr: *mut u8) -> Self {
        Self {
            ptr,
            offset: 0,
            _marker: PhantomData,
        }
    }

    /* methods that derived structures uses to indicate which fields and discriminators are present */
 }

trait UnsizedPaddable {
    unsafe fn pad(&self, pad: &mut Padder<'_, Self>);
}

trait Paddable {
    unsafe fn pad(&self, pad: &mut Padder<'_, Self>);
}

That way, anything that can provide a mutable pointer can have padding bytes written to it, like a Vec<u8>. It also follows that you ought to expose somehow whether a type T requires padding (I use T::PADDED and it optimizes really well). You might opt to make it generic through some trait, all though I personally wouldn't need that and can't think of a case where it is needed. There is quite a bit of subtlety (which is echoed in the actual Padded api). E.g. to determine how the instance of an enum is padded you have to load the discriminant, and since the enum might be part of a packed type the load has to be unaligned.

To use such an API, the caller is resonsible for ensuring that a buffer with capacity size_of::<T>() is available somewhere where the bytes of T has been copied to so that the padding can be filled in by the abstraction. This also makes it unsafe.

I think I roughly understand what's going on here, but can you say more about how this API is used? We have https://github.com/google/zerocopy/issues/494 which it sounds like describes something similar, but I'm not sure if it'd fully cover your use case.

  1. APIs for working with unsized types. Unfortunately everything needed to support it well is unstable (ptr_metadata, align_of_val_raw), so this is super messy and has to be specialized over the types that have metadata constructors (str and [T]). I'm not sure whether that's something a generic library wants to incorporate right now, but you can get a feel for how I did it with my own Pointee trait mirroring that in core.

Yeah, we're feeling this pain too. I've submitted https://github.com/rust-lang/reference/pull/1417, which should provide enough of a guarantee that you can build unsized type support on top of it (basically, you use slice_from_raw_parts to construct a *const [()] of the appropriate length and then cast it to *const T where T is a slice DST). We're planning on supporting slice DST conversions once that PR lands.

Again, please advise if there's something about zerocopy I've missed. And if there's question or push back feel free to push. I'll be happy to detail more things I can think of in the future.

udoprog commented 1 year ago

I think I roughly understand what's going on here, but can you say more about how this API is used? We have #494 which it sounds like describes something similar, but I'm not sure if it'd fully cover your use case.

That is the same as my ZeroCopy::to_bytes. You're on the right track, but it's a bit too high level to incorporate well into buffers.

You can actually see how the proposed method in #494 could be implemented using Padder / pad / T::PADDED here: https://docs.rs/musli-zerocopy/latest/src/musli_zerocopy/traits.rs.html#447. And I believe #494 is also why I got the impression that padding APIs was on your roadmap.

udoprog commented 1 year ago

"Well" as-in, the ability to pad something solely based on a &T. The proposed API in #494 would require you to make a local mutable aligned copy out of it somewhere. Byte buffers are not always aligned, so copying it into a buffer and then coercing into &mut T to pipe it through that API is probably not an option.

joshlf commented 1 year ago

Okay gotcha, thanks for clarifying! IIUC, the point is that you need to be able to take a &T where the T might contain padding and copy its bytes into a target buffer while initializing any padding bytes along the way, but of course you can't do that in the &T itself because it's not mutable. Is that right?

I've updated #494 to mention this use case.

joshlf commented 12 months ago

@udoprog Just wanted to let you know that we added convenience constructors for bytes-to-type conversions like ref_from in https://github.com/google/zerocopy/pull/526; these are available in the latest published version.