Support custom validators for `TryFromBytes`

These are kinds of validity that users may need to have checked before transmutation from &[u8] to &T:

The language-level validity of the bits for the type of each field in T, e.g. a bool must be either 0 or 1. This is implemented by the derive.
The library-level validity of the bits in T, e.g. an invariant that the first field is less than the second. This can be referenced by the derive but inherently must be user-controlled.
The library-level validity of the individual fields in T, based on the above library-level check applied to each field. This is also implemented by the derive.
The library-level validity of the length of the struct given the header contents, e.g. the length field is equal to the size of the tail slice. This only applies to dynamically sized structs ending in a slice.

The plan discussed in #5 and #372 is to support the concept of a custom validator, a function or closure provided to derive(TryFromBytes) that will always be called before allowing a TryFromBytes transmute to succeed.

Open Questions

Should users be able to provide their own error type, to communicate the way in which the validation failed? How would this be exposed to the higher-level TryFromBytes APIs?
How will data validation interact with custom DSTs? Is the validator also responsible for validating the correct length of the tail slice, or even deriving the correct length (as proposed in https://github.com/google/zerocopy/issues/1289#issuecomment-2121244709)?
- What happens if the user provides both a #[length] and custom validator function and they disagree?
- Given struct Outer { x: u8, y: Inner }/struct Inner { a: [u8; 4], b: [u8] }, is the validator for Outer allowed to communicate a maximum length for the tail slice located inside of Inner? What if Inner has a custom validator that returns "valid if the tail slice is truncated to N", but Outer has a different return from its custom validator?
What are the required signature(s) for the custom validators? In theory the derive can support many return types simultaneously, including bool, Result<(), CustomError>, or Result<Option<usize>, CustomError> (to communicate a required length).
When zerocopy's expectations surrounding validator behavior are violated, should it panic (terrible for embedded), only panic in debug mode (middle ground a la + overflow checking), or always reject the input (could miss bugs in validators).

@jswrenn and I met and discussed potential designs. He has his own design proposal that he'll share at some point. Here's mine.

This design is based on the observation that some users may want to not only validate safety conditions, but actively transform their type - for example to construct a new witness wrapper type. It permits a distinction between the type being constructed and its "raw" equivalent.

This design is also explicitly meant to support custom length fields (#1289). In order to do that, validation can return a rich error message which might be required when a length field cannot be parsed.

We didn't have time during our discussion to think deeply about how this composes with a #[length] attribute. For example, is the length extracted before or after calling the user's custom validator? Can the #[length] attribute itself provide a "custom extractor" that also has error cases? These will need to be thought through.

// Used in the following examples
type MaybeValid<T, A> = Ptr<T, (A, Any, AsInitialized)>;
type MaybeAligned<T, A> = Ptr<T, (A, Any, Valid)>;

unsafe trait TryFromBytes {
    // Set by `#[zerocopy(raw = ...)]`, defaults to `Self`.
    #[doc(hidden)]
    type Raw: TryFromBytes;

    // Set by `#[zerocopy(error = ...)]`, defaults to
    // `()` or similar.
    //
    // Not doc(hidden)!
    type ValidationError;

    // Replaces `is_bit_valid`.
    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<Self::Raw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError>;
}

Here's how this would be used by a hypothetical user:

#[derive(TryFromBytes)]
#[zerocopy(raw = FooRaw, error = FooError, validator = Foo::try_from_raw)]
struct Foo(...);

#[derive(TryFromBytes)]
struct FooRaw(...);

struct FooError(...);

impl Foo {
    fn try_from_raw<A: Aliasing>(
        r: Result<MaybeAligned<Self::Raw, A>, Self::Raw::ValidationError>
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        ...
    }
}

Our derive would emit the following impl:

unsafe impl TryFromBytes {
    type Raw = FooRaw;
    type ValidationError = FooError;

    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<FooRaw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        let raw_result = FooRaw::is_bit_valid(maybe_raw);
        Foo::try_from_raw(raw_result)
    }
}

As written, this gives the user full power: they are responsible for converting Self::Raw::ValidationError into Self::ValidationError and MaybeAligned<Self::Raw> into MaybeAligned<Self>. However, the user may not want to deal with all of these details. Thus, we can abstract somewhat and support the following simplifications:

The validator can take a value rather than a Result; the framework will handle propagating errors
The validator can return a bool rather than a Result; the framework will handle generating Ok or Err values as appropriate

First, we introduce the following trait and implement it for different function types. Each impl (except for the most general one, which puts all of the onus on the user) carries restrictions:

When the validator takes a value rather than a Result, we add a T::ValidationError: From<<T::Raw as TryFromBytes>::ValidationError> bound to ensure that, if an error is encountered, the framework can convert the error
When the validator returns a bool, we additionally require T::ValidationError: Default to ensure that our framework can synthesize new errors when the validator returns false

trait Validator<T: TryFromBytes, A, Disambiguator> {
    fn try_from_raw(
        self,
        r: Result<
            MaybeAligned<<T as TryFromBytes>::Raw, A>,
            <<T as TryFromBytes>::Raw as TryFromBytes>::ValidationError,
        >,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError>;
}

impl<T, F, A> Validator<T, A, ()> for F
where
    T: TryFromBytes,
    F: FnOnce(
        Result<MaybeAligned<T::Raw, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError>,
{
    fn try_from_raw(
        self,
        r: Result<MaybeAligned<T::Raw, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError> {
        self(r)
    }
}

impl<T, F, A> Validator<T, A, ((),)> for F
where
    T: TryFromBytes,
    T::ValidationError: From<<T::Raw as TryFromBytes>::ValidationError>,
    F: FnOnce(MaybeAligned<T::Raw, A>) -> Result<Maybe<T, A>, T::ValidationError>,
{
    fn try_from_raw(
        self,
        r: Result<MaybeAligned<T::Raw, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError> {
        match r {
            Ok(r) => self(r),
            Err(err) => Err(err.into()),
        }
    }
}

impl<T, F, A> Validator<T, A, (((),),)> for F
where
    T: TryFromBytes<Raw = T>,
    T::ValidationError: Default + From<<T::Raw as TryFromBytes>::ValidationError>,
    F: FnOnce(MaybeAligned<T::Raw, A>) -> bool,
{
    fn try_from_raw(
        self,
        r: Result<MaybeAligned<T, A>, <T::Raw as TryFromBytes>::ValidationError>,
    ) -> Result<MaybeAligned<T, A>, T::ValidationError> {
        match r {
            Ok(r) => if self(r) {
                Ok(r)  
            } else {
                Err(Default::default())
            },
            Err(err) => Err(err.into()),
        }
    }
}

Finally, we can change the derive-generated code like so:

    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<FooRaw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        let raw_result = FooRaw::is_bit_valid(maybe_raw);
        Validator::try_from_raw(Foo::try_from_raw, raw_result)
    }

Note that this also supports closures rather than named validators. For example, if the user specifies #[zerocopy(validator = |foo| foo.0.is_valid())], this desugars as expected:

    fn try_from_maybe_valid_raw<A>(
        maybe_raw: MaybeValid<FooRaw, A>,
    ) -> Result<MaybeAligned<Self, A>, Self::ValidationError> {
        let raw_result = FooRaw::is_bit_valid(maybe_raw);
        Validator::try_from_raw(|foo| foo.0.is_valid(), raw_result)
    }

@djkoloski if we added validation context to this design, would it support your rkyv use case? In particular, is the ability to perform mutation in the validator enough to do your fix-up operation?

We'd at a minimum need a slight tweak: we'd have to have a way of signaling that a TryFromBytes impl only works on mutable input in order to do the fix-up. But let's assume we've done that for the sake of this question.

google / zerocopy

Support custom validators for `TryFromBytes` #1330

Open Questions