jzimmerman / langcc

langcc: A Next-Generation Compiler Compiler
https://langcc.io
Apache License 2.0
1.73k stars 58 forks source link

Improving compilation times. #50

Closed modulovalue closed 1 year ago

modulovalue commented 1 year ago

The paper says that compiling the Python/Go grammars can take ~40 seconds.

I've had great success with improving the compile times of an LALR(1) parser by swapping a Bermudez and Logothetis SLR style approach and the traditional approach of building the full LR(1) automaton to get the LALR lookaheads with the DeRemer and Pannello relations-based approach. (calculating the LALR lookahead sets for a huge grammar improved from 3 seconds to 300ms).

If we consider that LR parsing is LALR parsing where some states are being split to provide more context (c.f. Parsing with Pictures, K. Pingali), and LALR lookaheads are always more refined than SLR followsets, then I feel like, taking an LALR style approach from the very beginning with, e.g., https://github.com/jzimmerman/langcc/issues/41 & https://github.com/jzimmerman/langcc/issues/43, might be very useful for improving compilation times and possibly parser performance.

There are fast LALR(k) compilers such as LPG (which is based on A Practical method for Constructing Efficient LALR(k) Parsers with Automatic Error Recovery, Philippe Charles) and there is at least one other approach for constructing fast LALR(k) parsers: Practical Arbitrary Lookahead LR Parsing, Bermudez & Schimpf. With one of those approaches, and the techniques that langcc introduces for finding states to split, I feel like, significantly faster compilation times might be possible.

(Of course, first, it would be interesting to know where the bottleneck actually is, I suspect it's the LR construction scheme, which motivated this issue.)

modulovalue commented 1 year ago

Because langcc appears to be abandoned, I'm going to close this issue.