google / zerocopy

https://discord.gg/MAvWH2R6zk
Apache License 2.0
1.58k stars 104 forks source link

Support custom validators for `TryFromBytes` #1330

Open kupiakos opened 5 months ago

kupiakos commented 5 months ago

These are kinds of validity that users may need to have checked before transmutation from &[u8] to &T:

  1. The language-level validity of the bits for the type of each field in T, e.g. a bool must be either 0 or 1. This is implemented by the derive.
  2. The library-level validity of the bits in T, e.g. an invariant that the first field is less than the second. This can be referenced by the derive but inherently must be user-controlled.
  3. The library-level validity of the individual fields in T, based on the above library-level check applied to each field. This is also implemented by the derive.
  4. The library-level validity of the length of the struct given the header contents, e.g. the length field is equal to the size of the tail slice. This only applies to dynamically sized structs ending in a slice.

The plan discussed in #5 and #372 is to support the concept of a custom validator, a function or closure provided to derive(TryFromBytes) that will always be called before allowing a TryFromBytes transmute to succeed.

Open Questions

joshlf commented 5 months ago

See also: #590

joshlf commented 1 month ago

@jswrenn and I met and discussed potential designs. He has his own design proposal that he'll share at some point. Here's mine.

This design is based on the observation that some users may want to not only validate safety conditions, but actively transform their type - for example to construct a new witness wrapper type. It permits a distinction between the type being constructed and its "raw" equivalent.

This design is also explicitly meant to support custom length fields (#1289). In order to do that, validation can return a rich error message which might be required when a length field cannot be parsed.

We didn't have time during our discussion to think deeply about how this composes with a #[length] attribute. For example, is the length extracted before or after calling the user's custom validator? Can the #[length] attribute itself provide a "custom extractor" that also has error cases? These will need to be thought through.

// Used in the following examples
type MaybeValid<T, A> = Ptr<T, (A, Any, AsInitialized)>;
type MaybeAligned<T, A> = Ptr<T, (A, Any, Valid)>;

unsafe trait TryFromBytes {
    // Set by `#[zerocopy(raw = ...)]`, defaults to `Self`.
    #[doc(hidden)]
    type Raw: TryFromBytes;

    // Set by `#[zerocopy(error = ...)]`, defaults to
    // `()` or similar.
    //
    // Not doc(hidden)!
    type ValidationError;

    // Replaces `is_bit_valid`.
    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<Self::Raw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError>;
}

Here's how this would be used by a hypothetical user:

#[derive(TryFromBytes)]
#[zerocopy(raw = FooRaw, error = FooError, validator = Foo::try_from_raw)]
struct Foo(...);

#[derive(TryFromBytes)]
struct FooRaw(...);

struct FooError(...);

impl Foo {
    fn try_from_raw<A: Aliasing>(
        r: Result<MaybeAligned<Self::Raw, A>, Self::Raw::ValidationError>
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        ...
    }
}

Our derive would emit the following impl:

unsafe impl TryFromBytes {
    type Raw = FooRaw;
    type ValidationError = FooError;

    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<FooRaw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        let raw_result = FooRaw::is_bit_valid(maybe_raw);
        Foo::try_from_raw(raw_result)
    }
}

As written, this gives the user full power: they are responsible for converting Self::Raw::ValidationError into Self::ValidationError and MaybeAligned<Self::Raw> into MaybeAligned<Self>. However, the user may not want to deal with all of these details. Thus, we can abstract somewhat and support the following simplifications:

First, we introduce the following trait and implement it for different function types. Each impl (except for the most general one, which puts all of the onus on the user) carries restrictions:

trait Validator<T: TryFromBytes, A, Disambiguator> {
    fn try_from_raw(
        self,
        r: Result<
            MaybeAligned<<T as TryFromBytes>::Raw, A>,
            <<T as TryFromBytes>::Raw as TryFromBytes>::ValidationError,
        >,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError>;
}

impl<T, F, A> Validator<T, A, ()> for F
where
    T: TryFromBytes,
    F: FnOnce(
        Result<MaybeAligned<T::Raw, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError>,
{
    fn try_from_raw(
        self,
        r: Result<MaybeAligned<T::Raw, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError> {
        self(r)
    }
}

impl<T, F, A> Validator<T, A, ((),)> for F
where
    T: TryFromBytes,
    T::ValidationError: From<<T::Raw as TryFromBytes>::ValidationError>,
    F: FnOnce(MaybeAligned<T::Raw, A>) -> Result<Maybe<T, A>, T::ValidationError>,
{
    fn try_from_raw(
        self,
        r: Result<MaybeAligned<T::Raw, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError> {
        match r {
            Ok(r) => self(r),
            Err(err) => Err(err.into()),
        }
    }
}

impl<T, F, A> Validator<T, A, (((),),)> for F
where
    T: TryFromBytes<Raw = T>,
    T::ValidationError: Default + From<<T::Raw as TryFromBytes>::ValidationError>,
    F: FnOnce(MaybeAligned<T::Raw, A>) -> bool,
{
    fn try_from_raw(
        self,
        r: Result<MaybeAligned<T, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError> {
        match r {
            Ok(r) => if self(r) {
                Ok(r)  
            } else {
                Err(Default::default())
            },
            Err(err) => Err(err.into()),
        }
    }
}

Finally, we can change the derive-generated code like so:

    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<FooRaw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        let raw_result = FooRaw::is_bit_valid(maybe_raw);
        Validator::try_from_raw(Foo::try_from_raw, raw_result)
    }

Note that this also supports closures rather than named validators. For example, if the user specifies #[zerocopy(validator = |foo| foo.0.is_valid())], this desugars as expected:

    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<FooRaw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        let raw_result = FooRaw::is_bit_valid(maybe_raw);
        Validator::try_from_raw(|foo| foo.0.is_valid(), raw_result)
    }

@djkoloski if we added validation context to this design, would it support your rkyv use case? In particular, is the ability to perform mutation in the validator enough to do your fix-up operation?

We'd at a minimum need a slight tweak: we'd have to have a way of signaling that a TryFromBytes impl only works on mutable input in order to do the fix-up. But let's assume we've done that for the sake of this question.