mrkkrp / megaparsec

Industrial-strength monadic parser combinator library
Other
895 stars 83 forks source link

Rewrite rules for parser primitives #556

Closed lrworth closed 1 week ago

lrworth commented 3 months ago

I've just replaced

M.many (M.anySingleBut '\n')

with

M.takeWhileP Nothing (/= '\n')

and my program used an order of magnitude less memory and time. It seems the latter should always be preferred.

Would it be good for a REWRITE rule to make this substitution (and other similar ones)? I'd be happy to submit a PR.

lrworth commented 3 months ago

The substitution I made is not always possible because of the types. The rule could apply chunkToTokens to retain the list result, but that may undo the performance gain.

I'll leave this open in case someone has a bright idea.

mrkkrp commented 1 week ago

In general, there is not much to be gained from rewrite rules in Megaprasec. It is true that some constructs can be rewritten, but I'd argue that this should be done explicitly by the programmer with full understanding of the context. Not every construction with many can be replaced with takeWhileP. It depends on the complexity of the argument parser of many. The types are also not the same, as you note. Another problem with rewrite rules is that they are implicit and may or may not fire depending on many factors. It is much better to be aware of the distinction between these two approaches and choose explicitly rather than hope that the compiler will do the right thing for you. Thus, I chose the approach of stating clearly in the docs and in the tutorial that it is beneficial to use takeWhileP and other similar (relatively new) primitives rather than many and some. This way performance can be controlled in a much more reliable way and it is also a solution with lower complexity.