Closed nchen63 closed 4 months ago
The parser-generator algorithm used by this tool definitely has limits, and you'll want to keep your (expanded) grammar size within reasonable bounds. A rule with 600 or-ed tokens sounds like a bad idea. Do these tokens actually serve a different purpose in the grammar, or could you collapse them to a single more generic token type? Optionals and or-expressions are compiled by expanding them into multiple rules (i.e. (a | b) c?
becomes a
, b
, a c
, b c
), and if you combine a bunch of them you can get a serious blowup in the amount of rules.
Recent versions of lezer-generator should emit warning for rules that produce an enormous number of alternatives, which should help guide users on how to fix that. Beyond that, the LR parser generator algorithm just works this way, needing to generate the entire state space for the grammar, and I think there's always going to be a point of complexity where it breaks down (both in memory use and processing time use).
I have a large grammar that causes lezer-generator to run of a memory, even running node with
--max-old-space-size
of 24 gigs of memory.I have tried to reduce the number of states by breaking out rules with a large number of optional rules into separate rules, but it hasn't solved the problem. One rule does have a ~600 or-ed tokens, and removing that did cause the grammar build to finish, but I don't know how that can be optimized. I tried breaking that up, but it didn't seem to help.
Can you offer guidance as to how the memory footprint could be reduced (either by modifying the grammar or with a PR)?