google / zerocopy

https://discord.gg/MAvWH2R6zk
Apache License 2.0
1.58k stars 104 forks source link

Add `FromBytes`/`IntoBytes` methods which read from/write to an `io::Read`/`io::Write` #158

Open joshlf opened 1 year ago

joshlf commented 1 year ago

Progress

Original text

Crosvm has a utility function called zerocopy_from_reader:

pub fn zerocopy_from_reader<R: io::Read, T: FromBytes>(mut read: R) -> io::Result<T> {
    // Allocate on the stack via `MaybeUninit` to ensure proper alignment.
    let mut out = MaybeUninit::zeroed();

    // Safe because the pointer is valid and points to `size_of::<T>()` bytes of zeroes,
    // which is a properly initialized value for `u8`.
    let buf = unsafe { from_raw_parts_mut(out.as_mut_ptr() as *mut u8, size_of::<T>()) };
    read.read_exact(buf)?;

    // Safe because any bit pattern is considered a valid value for `T`.
    Ok(unsafe { out.assume_init() })
}

Maybe we should add something similar to FromBytes (and potentially a write analogue to AsBytes)?

joshlf commented 1 month ago

@nwplayer123 has also requested this feature.

jswrenn commented 1 month ago

The obvious name, IMO, for this is read_from, but that name is currently squatted by a #[doc(hidden)] and #[deprecated].

Since it's marked #[doc(hidden)], it doesn't have any SemVer obligations, but in the interest of providing a smooth transition for users, perhaps we wait a release or two before revising it?

If this is the course we take, we should update its deprecation message to something like this:

#[deprecated(since = "0.8.0", note = "renamed to `FromBytes::read_from_bytes`. Migrate immediately. This item will soon be re-used for a different routine.")]
jswrenn commented 1 month ago

Not sure what to call the _from_prefix variant, since read_from_prefix is already used, too (except for a visible, not deprecated function).

joshlf commented 1 month ago

A few considerations:

jswrenn commented 1 month ago
  • I don't think we need prefix/suffix variants since a reader vends single bytes at a time, and so it can always give us exactly size_of::<Self>() bytes

There's an oddity about Read here in that it provides read_to_end so it's quite possible to distinguish between the read-all and read-prefix cases.

joshlf commented 1 month ago

IMO it's fine to provide the minimal API and let users manually construct fancier use cases. All of this can be built safely using our existing primitives anyway.

NWPlayer123 commented 1 month ago
  • I don't mind read_from_io_read or similar. It's weird, but also consistent with our naming convention of read_from_<place>

Throwing out the option of read_from_stream, since the Read trait embodies a stream of bytes that you can just pull from instead of a discrete byte slice.

russellbanks commented 3 weeks ago

It would be helpful to consider TryFromBytes here too.

For example, I have this enum that represents compression:

#[repr(u8)]
pub enum Compression {
    Stored,
    Zlib,
    BZip2,
    LZMA1,
    LZMA2,
}

Currently, I use FromRepr from strum to get an enum from the byte:

let compression = reader.read_u8()?;
header.compression = Compression::from_repr(compression)
    .ok_or_else(|| eyre!("Unexpected compression value: {compression}"))?;

Now that we have TryFromBytes, I could use that instead. However, as I can't just zero the type and write data to it like you could do when using FromBytes, it requires me to read the data into a buffer, and then use TryFromBytes:

let mut buf = [0; size_of::<Compression>()];
reader.read_exact(&mut buf)?;
Compression::try_read_from_bytes(&buf)

This is now several lines every time I want to read an enum. It's difficult to make this generic as the below function fails to compile with constant expression depends on a generic parameter (@joshlf pointed out #248 here). Using vec![] works but then it's not allocated on the stack.

fn enum_value<T: KnownLayout + TryFromBytes + Sized>(reader: &mut impl Read) -> Result<T> {
    let mut buf = [0; size_of::<T>()];
    reader.read_exact(&mut buf)?;
    T::try_read_from_bytes(&buf)
}

I've also considered a macro, which does work, but it's not syntactically as nice as it could be.

macro_rules! enum_value {
    ($reader:expr, $ty:ty) => {{
        let mut buf = [0; size_of::<$ty>()];
        $reader.read_exact(&mut buf)?;
        <$ty>::try_read_from_bytes(&buf)
    }};
}

I've considered using try_transmute!() but it still requires me to read the correct size of the data and ultimately, it would be the easiest for me if there was a way to interpret bytes straight from the reader.

joshlf commented 3 days ago

More prior art: The ReadBytesExt and WriteBytesExt traits from the byteorder crate as mentioned in https://github.com/google/zerocopy/issues/438#issuecomment-2451702061.