winnow-rs / winnow

Making parsing a breeze
https://docs.rs/winnow
Other
511 stars 39 forks source link

Memoize / packrat support for reusing parse results when multiple `alt` branches use the same parser #480

Open epage opened 6 months ago

epage commented 6 months ago

The following is a left-recursive grammar, where the parser tries to parse an and_expression (details not important) and some other expressions that could come after it (link goes to exact line 28):

https://gitlab.com/seloxidate/selector/-/blob/33febe8d2de29199b09a271fabac2097d1c1f161/src/parsers/conditional_expression.rs#L28

BUT, if it fails to parse "some other expressions that come after it", it must parse the and-expression again in the other alt branch (line 39):

https://gitlab.com/seloxidate/selector

One can see that this could easily lead to exponential parsing time, so here, using memorized on and_expression can be very helpful and would prevent exponential runtime (with additional memory needed - a lot more memory).

Hope this clarifies it a bit and where one would use memorized.

Also, for inspiration, you might want to look into nom-packrat. It's an extension to nom that allows packrat parsing

https://github.com/dalance/nom-packrat

From https://hachyderm.io/@janriemer@floss.social/111960492116311141

epage commented 6 months ago

See also https://docs.rs/chumsky/1.0.0-alpha.5/chumsky/trait.Parser.html#method.memoized

epage commented 6 months ago

Need to look at how chumsky and nom-packrat deal with storage.

The question I have is whether the storage lives directly in the Memoized (requires multiple accesses of the same instance) instance or externally, like an Arc<Mutex<_>> (requires tracing back to the same instantiation) or TLS (always used but requires explicit initialization).