google / zerocopy

https://discord.gg/MAvWH2R6zk

Apache License 2.0

1.57k stars 103 forks source link

`FromBytes::(mut_)?slice_from_[prefix|suffix]` also returns the prefix/suffix instead of just the read slice #884

Closed kupiakos closed 6 months ago

kupiakos commented 8 months ago

slice_from_prefix returns Option<(&[Self], &[u8])>
slice_from_suffix returns Option<(&[u8], &[Self])>
mut_slice_from_prefix returns Option<(&mut [Self], &mut [u8])>
mut_slice_from_suffix returns Option<(&mut [u8], &mut [Self])>

However, all of the other methods drop the prefix/suffix:

FromBytes::read_from_prefix returns Option<Self>
FromBytes::read_from_suffix returns Option<Self>
FromBytes::ref_from_prefix returns Option<&Self>
FromBytes::ref_from_suffix returns Option<&Self>
FromBytes::mut_from_prefix returns Option<&mut Self>
FromBytes::mut_from_suffix returns Option<&mut Self>

This is desirable: the majority of the time I drop the prefix anyways and these are convenience methods for methods in Ref. [ref|mut]_from_[prefix|suffix] in their docs both point to using Ref to preserve the prefix/suffix.

joshlf commented 8 months ago

This will be a breaking change, so we'll need to make sure we either address it as part of 0.8 or punt it to a future release. I've added it to #671 to make sure we make a decision one way or another before we release 0.8.

joshlf commented 7 months ago

Heads up about https://github.com/google/zerocopy/issues/1051, which relates to this, and about https://github.com/google/zerocopy/pull/1059, which resolves #1051 in the opposite direction from this request.

Unless we want even more method duplication (which we'd really like to avoid), it seems that we'll need to just pick one of the following two approaches:

The FromBytes methods drop the suffix/prefix, and users who want them need to use Ref
The FromBytes methods return the suffix/prefix, and users who don't need them need to explicitly discard them

1059 takes the latter approach. From a survey of uses in the ecosystem, it seems that needing the suffix/prefix is more common than not needing it. We also feel like discarding the suffix/prefix is more ergonomic than using the `Ref` API equivalents.

@kupiakos want to make sure you're aware that this is something we're considering and give you a chance to chime in.

kupiakos commented 7 months ago

From a survey of uses in the ecosystem

How was this survey performed? Via usage of the current unstable FromBytes APIs in open-source code? Did you also include usage of the exact-size APIs that are preceded by a .get(..SIZE)?

it seems that needing the suffix/prefix is more common than not needing it

I don't doubt that keeping the suffix is a common operation, especially in networking code. Firmware, unfortunately, tends to not be public. We need something ergonomic to translate something like this C code, which is extremely prevalent, into fully safe Rust:

int foo(char *data, size_t size) {
    if size < sizeof(struct Foo) {
        return ERROR_TOO_SMALL;
    }
    // Foo is a packed struct.
    struct Foo foo* = (Foo*)data;
    ...
}

Essentially, we want the safe version of a bounds-check followed by a pointer cast (also checking alignment if necessary). Requiring an exact size for the buffer rather than the buffer simply be large enough is rare and unnecessarily restrictive. When working with embedded code bases, it's pretty common to have a single buffer scratch space to do operations that may be larger than is necessary, or is reused as part of the response, such as with TPM.

Doing extra work and then throwing it away is antithetical to good embedded software design. Not only are methods that return more than two pointers a risk for code size bloat (even with inlining), but splitting currently introduces a panic path. The optimizer has an extremely hard job, and it is ideal to limit how much it needs to consider in order to produce optimal results, especially compared to the original C.

The mut_from_prefix method as-is currently uses Ref::new_from_prefix but that puts extra burden on the optimizer. I plan on rewriting it to instead start with a bytes.get(..size_of::<Self>()). Splitting the bytes is an extra operation, and we should only do that if we actually need to do it.

joshlf commented 6 months ago

I don't think we can promise anything regarding what code we write internal to zerocopy. We currently make heavy use of the optimizer so that we can have reasonable internal abstractions, and it would be a big blow to the quality and reliability of our codebase to have to start reimplementing things by hand, especially when that involves adding rather than removing instances of the unsafe keyword.

That said, I think there's a third way. Would the following work for your use case? It uses unrelated APIs in zerocopy and provides much clearer visibility into exactly which operations are being performed. It would presumably be straightforward to wrap in a utility macro/method if desired, which would allow you to write roughly the code you currently write, but with the guarantees you're looking for.

// runtime check: do we have sufficient bytes?
let Ok(bytes) = <&[u8; size_of::<Foo>()]>::try_from(buffer) else {
    panic!("wrong number of bytes");
};

// no runtime checks
let des: &Unalign<Foo> = transmute_ref!(bytes);

// runtime check: is the deserialization well-aligned?
let Some(des) = des.try_deref() else {
    panic!("wrong alignment");
};

In particular:

TryFrom<&[u8]> for &[u8; N] is in the standard library
transmute_ref! desugars to a single transmute
Unalign::try_deref performs a single alignment check

joshlf commented 6 months ago

Closing in favor of https://github.com/google/zerocopy/issues/1051, which was implemented in https://github.com/google/zerocopy/pull/1059

google / zerocopy

`FromBytes::(mut_)?slice_from_[prefix|suffix]` also returns the prefix/suffix instead of just the read slice #884

1059 takes the latter approach. From a survey of uses in the ecosystem, it seems that needing the suffix/prefix is more common than not needing it. We also feel like discarding the suffix/prefix is more ergonomic than using the Ref API equivalents.

1059 takes the latter approach. From a survey of uses in the ecosystem, it seems that needing the suffix/prefix is more common than not needing it. We also feel like discarding the suffix/prefix is more ergonomic than using the `Ref` API equivalents.