winnow-rs / winnow

Making parsing a breeze
https://docs.rs/winnow
Other
546 stars 42 forks source link

Derive API for binary parsing #234

Open DJDuque opened 1 year ago

DJDuque commented 1 year ago

Please complete the following tasks

winnow version

0.4.1

Describe your use case

In some cases, parsing binary data could be a lot simpler than it currently is. From my experience, most of the boilerplate logic required to parse a structure from a slice of bytes can be (and usually is) encoded in the structure definition.

Describe the solution you'd like

I believe that it would be very helpful to have a "Derive API" (akin to what clap does), which could derive a parse implementation for a structure based on their definition.

Based on this Reddit discussion, this will probably not be very useful for parsing text. Nonetheless, (from my limited experience and the examples below) this significantly simplifies binary parsing.

Additionally, as a beginner my self, I believe that this could reduce the barrier for new people to use the library. Some times users just need to parse simple structures and learning in depth how to use a parser combinators library could be too much.

Alternatives, if applicable

No response

Additional Context

Something like this already exists for nom: nom-derive Some examples to illustrate how useful this is for parsing binary formats: NTP, TLS, NetBIOS.

inflation commented 1 year ago

There's deku, FWIW. It's somewhat beyond the scope of a combinator lib.

epage commented 1 year ago

Some quick thoughts

epage commented 1 year ago

82 proposes a macro to accompany tuple sequencing to populate directly into a struct. This allows us to use type inference for Input and Error, avoiding that issue. The main downside is that you can't infer parsers from a type (wish we could do Struct::field to get a type)

epage commented 1 year ago

Huh, I had thought #82 was only about nom-fields but it is written more broadly to also include potentially adding a derive. I suspect #82 will focus more on a nom-fields like solution though and so I think it could be nice to keep both open

DJDuque commented 1 year ago

Not a fan of PreExec but we'd need a way to split a byte into two fields

I think that all use cases of PreExec can be replaced by having a previous field in your structure, and let users reference it to assign a value. This (referencing a previous variable) is so common, that it will have to exist regardless of whatever is decided about PreExec. Sure, this is not ideal and it would require the user to create an unused extra field; but I think that living without PreExec (forever or) until we can think of a non-hacky way of implementing it is not a deal breaker.

I think I'd prefer there not to be parse_be and parse_le functions. Instead, we choose a specific endianness and it has to be overwritten

I also felt a bit weird about these methods in nom-derive's Parse trait, but I don't think I completely understand your suggestion. What would this mean for the signature of the only public parse method generated for a given structure? Is it something like?:

pub fn parse(i: &'a [u8]) -> Result<(&`a [u8], S), Error>

This makes sense for cases where the endianness can be specified while defining the structure e.g. it is determined by the format, or the endianness is inferred from the first field. What about structures that need both a le and be parsing implementations?

epage commented 1 year ago

This makes sense for cases where the endianness can be specified while defining the structure e.g. it is determined by the format, or the endianness is inferred from the first field. What about structures that need both a le and be parsing implementations?

iirc nom-derive supports your parser accepting parameters and could accept endianess that way

epage commented 10 months ago

btw we now have seq! macro-rules macro which is a lighter weight version of this. I think the main thing missing is a way to reference previous fields.