google / zerocopy

https://docs.rs/zerocopy
Apache License 2.0
1.03k stars 80 forks source link

Support casting to a `KnownLayout` type with a user-provided length computed dynamically #1289

Open joshlf opened 1 month ago

joshlf commented 1 month ago

See also: #1290, #1328, https://github.com/google/zerocopy/issues/5#issuecomment-2120952779

Some formats have an explicit length field, and some use cases with these formats require parsing a subset of the available bytes based on that length field. We'd like to write something like:

#[derive(KnownLayout, FromBytes)]
#[repr(C)]
struct UdpHeader {
    src_port: [u8; 2],
    dst_port: [u8; 2],
    length: [u8; 2],
    checksum: [u8; 2],
}

#[derive(KnownLayout, FromBytes)]
struct UdpPacket {
    header: UdpHeader,
    body: [u8],
}

Unfortunately, all of the conversions we permit today require the number of bytes to be parsed to be known ahead of time - either it's simply the entire source byte slice, or it's computed from a fixed number of trailing slice elements.

Ideally, we could support an API that permits the caller to specify how to extract the length and then use that to determine the number of bytes to parse.

One idea: Do one parsing pass, then allow the user to provide a callback which extracts the length field. Finally, re-parse using the extracted length. There may be multiple axes we need to consider:

kupiakos commented 1 month ago

(also mentioned in https://github.com/google/zerocopy/issues/5#issuecomment-2120952779)

This feature could be achieved rather naturally by extending the validation routine to also extract length information. What if the result of is_bit_valid isn't a simple bool, but rather:

enum BitValidity {
    /// This &T contains invalid bits for the type.
    Invalid,

    /// This `&T` contains valid bits for the type.
    Valid,

    /// This `&T` would contain valid bits if the tail slice were truncated to this many elements.
    ///
    /// If the tail slice already contains exactly this many elements, this is semantically identical to returning `Valid`.
    ValidIfTruncatedTo(usize),
}

That way ref_from_prefix and friends could consume the correct amount of data based on e.g. a length in the header of a &T. Exact-matching conversions would reject the input if the validator returns ValidIfTruncatedTo with a length different from the length derived from the bytes provided, and prefix-matching conversions would truncate the input to the given length. I can extend this design into something more detailed if desired.