stil4m / elm-syntax

Elm syntax in Elm
MIT License
94 stars 28 forks source link

Reimplement the parser using pratt parsers #199

Closed jfmengels closed 2 months ago

jfmengels commented 1 year ago

In the Expression type, there is the Operator String variant which users have to handle but will in practice never appear. This is used temporarily before we post-process the parser and re-balance the AST with operators being replaced with the appropriate OperationApplication AST Node.

Before re-balancing, we will end up with Application [ left, Operator "+", right ] which will get turned into OperatorApplication "+" infixDirection left right.

Since users of the package will never encounter the Operator variant, it's better to remove it. To do so, we either need to come up with a different trick to keep the operation somewhere before we transform it, or we parse operations the right way directly, which is the option I'd like to suggest.

There is a special kind of parser (or rather, a parser technique) that makes sure that operations are balanced the correct way directly. @janiczek explains how it works in this article, and there is already a package for it: dmy/elm-pratt-parser.

I would like to try this out, if we are successful, we can then get rid of the post-processing of the AST, which I can only imagine will make the parser much faster. The other usage of post-processing is for attaching {-| ... -} comments to the module or to the different declarations, which I believe we should also be able to do without post-processing.


I have already started a branch for this called pratt-parser-v8, which I made on top of the #188 branch when I realized this was needed to get rid of the Operator variant.

I am now realizing it is probably better to do this work before v8, in a v7 release.

The main reason for that is I am pretty new to writing parsers, and I find it really hard to adapt the parser for this package (which was made before elm/parser and then adapted using a wrapper), so... I'm thinking of re-writing the parser almost from scratch using pratt parsers and inspiration of the existing code. But I don't have confidence I can do this correctly the first time around, so I will probably want to compare the results for the new parser with the existing parser. And that is easy to do in the v7 branch when the ASTs are the same, but much harder to do when the.

I think this will be a good situation to add more tests too, especially for cases where the code should not parse, which is useful for elm-review for instance. We could also use the occasion to write nice errors for parsing problems. Finishing #185 first will likely be a good addition, to make sure that we don't mess up the ranges.

I will try to start my pratt-parser-v8 from scratch on top of v7. From my current attempt, I know I could really use some help though :sweat_smile: