winnow-rs / winnow

Making parsing a breeze
https://docs.rs/winnow
Other
523 stars 40 forks source link

Allow referring to elided fields values in later fields' parsers with `seq!` #406

Open DJDuque opened 9 months ago

DJDuque commented 9 months ago

Please complete the following tasks

winnow version

0.5.31

Describe your use case

Having _temporary variables that don't make it into the struct (but also don't get completely ignored like _) could be beneficial for cases like this. The above seq! call could look something like:

 pub(crate) fn event_view<'a>(
    endian: Endianness,
) -> impl Parser<&'a [u8], EventView<'a>, ContextError> {
    seq! {EventView {
        event_id: u16(endian),
        trigger_mask: u16(endian),
        serial_number: u32(endian),
        timestamp: u32(endian),
        _event_size: u32(endian).verify(|&n| n >= 8),
        _banks_size: u32(endian).verify(&n| n == _event_size - 8),
        bank_views: dispatch! {u32(endian);
                1 => length_and_then(success(_banks_size), repeat_till0(padded_bank(bank_16_view(endian)), eof)),
                17 => length_and_then(success(_banks_size), repeat_till0(padded_bank(bank_32_view(endian)), eof)),
                49 => length_and_then(success(_banks_size), repeat_till0(padded_bank(bank_32a_view(endian)), eof)),
                _ => fail,
            }.map(|(bank_views, _)| bank_views),
    }}
}

which is easier to follow and understand.

This will also allow us to use this macro to do stuff similar to nom-derive's anonimous fields mentioned here, e.g.

seq! {MyStruct {
    _b0: be_u8,
    li: success(_b0).map(|n| n >> 6),
    version: success(_b0).map(|n| (n >> 3) & 0b111),
    // ... etc
}}
epage commented 9 months ago

I would love to support this but I'm unsure of a way to do this with macro_rules which I'd really like to stick to with seq!.

himikof commented 9 months ago

I would love to support this but I'm unsure of a way to do this with macro_rules which I'd really like to stick to with seq!.

Maybe not exactly as proposed, but can the following syntax idea work as an implementable alternative?

seq! {MyStruct {
    #[ignore] b0: be_u8,
    li: success(b0).map(|n| n >> 6),
    version: success(b0).map(|n| (n >> 3) & 0b111),
    // ... etc
}}
DJDuque commented 7 months ago

@epage What do you think about @himikof's suggestion?

I love the seq macro, and I keep using it for basically everything. Having a #[ignore] or #[temporary] would be a huge improvement imo.

I am a bit busy with work right now, but if you like this suggestion, I can try to implement it.

epage commented 7 months ago

Sorry, lost track of this.

If we do this, I'd at least like to see mirrored support for tuples as well.

My main concern is with what to name it to make the intent clear. We aren't ignoring the parser, only the container initialization. Maybe #[temp] works for that? Any other thoughts?

DJDuque commented 5 months ago

I think I got this working for structs in #511, but I don't really know how to implement this for tuples.

Currently we can do something like (from the docs):

fn point(input: &mut &[u8]) -> PResult<(u32, u32)> {
    let num = dec_uint::<_, u32, ContextError>;
    seq!(num, _: (space0, b',', space0), num).parse_next(input)
}

Do we expect something like: seq!(var: num, empty.value(var)) to be possible? And then #[temp] e.g. seq!(#[temp] half_var: num, empty.value(half_var * 2), empty.value(half_var * 2))?

epage commented 5 months ago

@DJDuque thank for that work! I keep going back and forth on how we should handle this and if it should be handled. Especially seeing it in action, #[temp] feels weird and I worry that it makes the intent of the code less clear. This makes me wonder if we're trying to force too much through seq! and it would be better to instead encourage people to write their parser more explicitly in these cases. Macros are already a bit magical and without care they can be hard understand how to write them from the docs since the "signature" is opaque, how to read them in code if too much is done, how to get them to compile due to the signature problems and little traps like handling of commas.

This isn't too say "no" but I want to exercise some caution here and would appreciate input on managing the balance with seq!.

DJDuque commented 5 months ago

I agree that managing things can get out of hand if we try to force too much through macros, but I also think that referring to previous fields is common enough that it is reasonable for seq! to handle this.

The current seq! macro is so convenient that I think not having #[temp] (or an equivalent) encourages people to work around it with e.g. chained flat_maps (or other alternatives less readable than #[temp]) instead of falling back to write the full parser in a more explicit way.

It would be good to have the opinion of other seq! users. In my opinion, this is a huge quality-of-life improvement parsing binary data, and it warrants the extra macro complexity.

bbb651 commented 3 months ago

Not sure this is the right place for this but I'm just starting out with winnow and I've spent an embarrassingly long amount of time getting unreadable type error from the compiler because I didn't know about the behavior of referring to other struct fields and it doesn't seem to be documented anywhere. My code was something like:

seq! { Foo {
    identifier: identifier,
    // ...
}}

and I was trying to use identifier thinking it was the parser and not the field. Not sure if it's possible with macro_rules! or worth it but having it prefixed with let or replacing : with = to explicitly have these references could really help, but at the very least document this properly.

epage commented 3 months ago

seq! is trying to mirror the struct initialization syntax within Rust today which is why : is used. I hadn't thought of shadowing issues like this that arise from the discrepancy of behavior. Could you open a discussion for further discussion as this is distinct from referring to _ fields and it would help to keep this issue on topic.