jzimmerman / langcc

langcc: A Next-Generation Compiler Compiler
https://langcc.io
Apache License 2.0
1.73k stars 58 forks source link

Generating directly executable parsers #36

Closed nsajko closed 2 years ago

nsajko commented 2 years ago

The old LR parsing and finite automata literature used to distinguish between two basic ways of implementing DFA: table-based and "directly executable".

Directly executable (or hardcoded) is when the tables are actually encoded in the generated C++ code, usually there would be a label and a block of generated code for each state, and goto would be used for jumping from one state to the other.

If I understood correctly, your approach was to start from table-based and then augment it with some "recursive-descent" stuff (I only skimmed your paper as of yet).

Is the directly executable/hardcoded apprach compatible with your work? Have you considered letting langcc generate such parsers?

jzimmerman commented 2 years ago

While one could theoretically expand the generated LR parsing tables into executable code, such code would not be significantly more intuitive; it would still look like hard-coded tables, as the table indices (and resulting jump indices) do not provide significant intuition. For this reason, it does not appear worthwhile to enable this in langcc.

nsajko commented 2 years ago

Usually the motivation is performance. Hardcoding the tables into executable code should enable the C++ compiler to do a lot more optimization than it otherwise could, and I guess that less data cache pressure is also a good thing.

jzimmerman commented 2 years ago

There might be a slight performance advantage (or maybe not -- considering also the instruction cache), but in either case I don't think it would be nearly significant enough to justify that amount of complicated codegen.