haskell / happy

The Happy parser generator for Haskell
Other
290 stars 84 forks source link

Implement "Principled parsing for indentation-sensitive languages" #235

Open TeofilC opened 2 years ago

TeofilC commented 2 years ago

Principled parsing for indentation-sensitive languages [pdf] lays out a way for extending parser generators like Happy with the ability to directly deal with indentation-sensitive languages like Haskell. The paper mentions that a patch was made to extend Happy with this functionality, but afaict this was never merged into Happy.

Why hasn't this yet been added to Happy? Did nobody have the time to finish off and optimise this work, or is there another reason?

Ericson2314 commented 2 years ago

Thanks for opening this. I was pointed to that paper, and also saddened we were not already using it.

Ericson2314 commented 2 years ago

https://michaeldadams.org/papers/layout_parsing/LayoutParsing.pdf here is the PDF from the author's website, without paywall.

Ericson2314 commented 2 years ago

https://michaeldadams.org/projects/happy-indent/happy.indent.tar.gz (from https://michaeldadams.org/projects/) is the source code behind the things in the paper.

Do note that Happy was still in darcs then, so we will need to do some careful surgery to get just the fork parts as a branch off the right commit in this git repo.

Ericson2314 commented 2 years ago

happy.indent.tar.gz is a copy of the download just in case something were to happen to the author's website.

andreasabel commented 2 years ago

Is the work of Michael Adams production-ready? Looking at his paper, it seems that he has given a new grammar formalism IS-CFG (indentation sensitive context free grammars) and explained how to generate a parser.
However, there isn't any effort to design a nice surface syntax. To get layout into your grammar, you have to annotate each and every symbol on each production. Looking at his examples, it seems that many of these annotations are schematic. I'd find it tedious to figure out all of the annotations for a realistic grammar.
Experiments such as the layout mechanism of BNFC suggests that there are higher-level approaches to get indentation-sensitivity. BNFC's method is insufficient so far (see its open issues), but it is now actively worked on (results expected in 2023). It might be that Adams approach could be one layer on top of which sugar is defined (in the same way that fixity declarations are sugar and can be reduced to pure CFG grammars). However, there are other approaches to indentation-sensitivity that should be explored before committing to Adams' solution. (E.g. Erdweg, Rendel, Kästner, Ostermann, Layout-sensitive Generalized Parsing.)

TeofilC commented 1 month ago

See also: https://gitlab.haskell.org/ghc/ghc/-/issues/25322, which is another solution to having phase separation between the lexer and parser