hecatia-elegua / bilge

Use bitsized types as if they were a feature of rust.
Apache License 2.0
171 stars 17 forks source link

Expose fields of structs in return type of impls generated by `#[derive(TryFromBits)]` #40

Closed astral4 closed 1 year ago

astral4 commented 1 year ago

Given this Rust code...

#[bitsize(32)]
#[derive(TryFromBits)]
struct Example {
    foo: bool,
    bar: u24,
    baz: Kind,
}

#[bitsize(7)]
#[derive(TryFromBits)]
enum Kind {
    First,
    Second,
    Third,
}
let data: u32;

...the return type of Example::try_from(data) and data.try_into() is Result<Example, u32>. I think having a return type of Result<Example, (bool, UInt<u32, 24>, UInt<u8, 7>)>, exposing the fields of Example, is better. Essentially, structs become tuples of their fields, and enums become their arbitrary_int::UInt representations.

Motivation

impl ParseError {
    fn new(flag: u8) -> Self { ... };
}

I'm using this crate to parse flags in file headers. $n$ bits in one section are converted to an enum. However, not all possible values of $n$ bits correspond to enum variants, so this conversion is fallible. In case of failure, I'd like to return a ParseError from my own crate, mentioning the bits that failed to convert. However, I couldn't find an easy way to get these bits.

When the conversion fails, the original value is returned. I necessarily have access to this value—otherwise, the conversion could not be attempted in the first place—so why is it returned? The docs mention that "... TryFrom ... semantics are very clear," so if there's a good reason for this, I would like to know. I came up with two workarounds for now.

First workaround: bit manipulation

let flags: Example = data
    .try_into()
    .map_err(|n: u32| {
        let flag = ((data >> 25) & 127u32) as u8;
        ParseError::new(flag)
    })?;

This is relatively concise, but doing manual bit manipulations defeats the purpose of the crate.

Second workaround: convert twice

#[bitsize(32)]
#[derive(FromBits)]
struct RawExample {
    foo: bool,
    bar: u24,
    baz: u7,
}

struct Example {
    foo: bool,
    bar: u24,
    baz: Kind,
}

enum Kind {
    First,
    Second,
    Third,
}

impl TryFrom<RawExample> for Example {
    type Error = ParseError;

    fn try_from(value: RawExample) -> Result<Self, Self::Error> {
        let kind = match value.baz().value() {
            0 => Ok(Kind::First),
            1 => Ok(Kind::Second),
            2 => Ok(Kind::Third),
            flag => Err(ParseError::new(flag)),
        }?;

        Ok(Self {
            foo: value.foo(),
            bar: value.bar(),
            baz: kind,
        })
    }
}
let flags: Example = data
    .into()
    .try_into()?;

This preserves the safety provided by the crate, but the boilerplate seems unnecessary.

hecatia-elegua commented 1 year ago

Hello there! From and TryFrom are both used for parsing raw values and From is also used for getting back the underlying raw value again. I should mention these semantics somewhere instead of saying they're obvious, though.

The problem with returning the fields in try_from would be that we'd need to keep parsing the raw value after we've found an error, which might not be what everyone wants. But I understand what you mean, giving back the raw value doesn't really add anything. (This might be another case for having a "raw structure" too, which we could return here, which would also be better than returning insanely long tuples.) Let's consider an alternative:

I think we solved your usecase half-way a few weeks ago, by adding this: https://github.com/hecatia-elegua/bilge/blob/main/README.md?plain=1#L121-L150 The plan would be to extend this to also allow:

#[fallback]
Reserved(uN),

where N is the bitsize of the enum. This way you would match Kind::Reserved(flag) => ParseError::new(flag)?.

pickx commented 1 year ago

Another case against returning the value in theErr case: normally this is done to return ownership of the value that failed to convert instead of consuming it. Which doesn't really make sense when converting numbers, since the converted value is always going to be Copy and so won't be consumed.

I agree that the solution using #[fallback] would work for your usecase, but that's because your struct has only one fallible field. In other words, you know ahead of time that if try_from() failed, it's because of an invalid value for baz.

But what if you had two? Returning a big tuple or a "raw" struct doesn't actually help here, since you'll have to manually check each fallible field (though you do get to skip the task of extracting the value). As @hecatia-elegua said we also want to stop parsing after the first failure.

We can return the range of bits that had an unexpected value (or start bit + length), possibly along with the name of the field. That would make it pretty easy for the user to extract the invalid bits, but as @astral4 mentioned, breaks the abstraction of no manual bit manipulation.

Here's an idea (which may be infeasible, just guessing here): given a struct

#[bitsize(32)]
#[derive(TryFromBits)]
struct Example {
    foo: u5,
    this_one_can_fail: SomeEnum,
    this_one_can_also_fail: SomeOtherEnum,
    bar: bool,
}

#[bitsize(24)]
#[derive(TryFromBits)]
enum SomeEnum { /* ... */ }

#[bitsize(2)]
#[derive(TryFromBits)]
enum SomeOtherEnum { /* ... */ }

we also generate something like

enum ExampleConversionFailure {
    ThisOneCanFail(u24),
    ThisOneCanAlsoFail(u2),
}

and then the return type of Example::try_from would be Result<Example, ExampleConversionFailure>.

hecatia-elegua commented 1 year ago

@pickx yup, so removing the raw value from Err in some upcoming version would be good. I mean, if we can fill it in with something more useful.

For your idea, I think we would need annotations on the fallible fields to do so (or find some kind of type system magic instead of using FILLED to differenciate between From/TryFrom fields). I still like it, just noting this.

Also, maybe I didn't understand, but #[fallback] Reserved(uN) would work for multiple fields, no? It is bulkier, there's a trade-off between "I want to only access one-two fields in one place and there can be reserved/filler values" and "I want to access the whole thing or catch the exact error directly after parsing".

//u8
struct Example {
    foo: Enum1, //u2
    bar: Enum2, //u4
    baz: Enum3, //u2
}
fn main() {
    let e = Example::from(0b11_1111_11);
   // sth like this, depends on what you're doing
    match e.foo {
        Enum1::Reserved(v) => ParseError::new(v.value()), //or handle the value
        //...
    }
    match e.bar{
        Enum2::Reserved(v) => ParseError::new(v.value()), //or handle the value
        //...
    }
   //...
}

vs.

//u8
struct Example {
    foo: Enum1, //u2
    bar: Enum2, //u4
    baz: Enum3, //u2
}
fn main() {
    let e = Example::try_from(0b11_1111_11);
   // sth like this, depends on what you're doing
    match e {
        Err(ExampleErr::Foo(v)) => ParseError::new(v.value()), //or handle the value
        Err(ExampleErr::Bar(v)) => ParseError::new(v.value()), //or handle the value
        //...
    }
}

@astral4 do you have more code with a similar need? And which one would you prefer in your case?

astral4 commented 1 year ago

I have more structs with just one fallible enum each. I prefer the first way with the Reserved variants. With the first way, the user is dealing with type and field names (foo, bar, Enum1, Enum2, Reserved) that they specified themselves. With the second way, the user depends on generated enum and variant names (ExampleErr, Foo, Bar). This introduces the potential for name clashes and makes what the code is doing somewhat less obvious, since ExampleErr isn't visibly defined anywhere. The difference between the two ways is a little hard to articulate, so those are my thoughts for now.

Going back to my original problem, I think it's largely a matter of expressing parts that may fail and how to handle them. I've opted for the second workaround because the bitfield abstraction is preserved and the TryFrom impl can be easily extended. For example, for some fields, the value should be non-zero, and a conversion to something like core::num::NonZeroU32 makes that explicit.

impl TryFrom<RawExample> for Example {
    type Error = ParseError;

    fn try_from(value: RawExample) -> Result<Self, Self::Error> {
+       let size = value
+           .bar()
+           .value()
+           .try_into() // convert into NonZero type here
+           .map_err(|_| ParseError::Something)?;

        let kind = match value.baz().value() {
            0 => Ok(Kind::First),
            1 => Ok(Kind::Second),
            2 => Ok(Kind::Third),
            flag => Err(ParseError::new(flag)),
        }?;

        Ok(Self {
            foo: value.foo(),
            bar: size,
            baz: kind,
        })
    }
}

All in all, I still think a change to the type contained within Result:Err would be helpful, especially for simpler cases (e.g. struct with single fallible enum field). And the current error handling approach (stop parsing after encountering a failure) is a good default.

hecatia-elegua commented 1 year ago

So the first one it is! Not sure when I'll be adding that.

About name clashing in the other approach: The user could name-clash on ExampleErr, but tbh then I would just add a way to define the error name if really necessary. Or we could have a generic error type, sth like:

pub struct TryFromBitsErr<T>(pub T);

fn main() {
    let e = Example::try_from(0b11_1111_11);
    // sth like this, depends on what you're doing
    match e {
        Ok(example) => //...
        Err(TryFromBitsErr(ExampleErr::ThisOneCanAlsoFail(e))) => e.value() as u32,
        Err(TryFromBitsErr(ExampleErr::ThisOneCanFail(e))) => e.value(), //...
        //...
    }
}

of course this looks a bit ugly, also nesting TryFrom is not solved by this

hecatia-elegua commented 1 year ago

@astral4 thanks to @pickx we've now got support for

#[fallback]
Reserved(uN),

in the github main branch (will not release to crates.io immediately), feel free to try it.

hecatia-elegua commented 1 year ago

Closing for now, error type might be relevant again later.