Closed Varriount closed 3 years ago
Yeah, that's a tough one to diagnose because it is Nim that generates the error, not NPeg itself.
I recommend taking care of proper ordering of your rules, as stated in the manual: any rule that is declared before its use will be inlined, possible recursively. This might grow the generated code pretty quck, or even exponentially.
To minimize size, declare your grammar top down.
I'm not sure it's the order that is the problem - I suspect it might be the fact that most rules are declared like this:
some_statement <-
variation * one * of * some * statement |
variation * two * of * another * statement |
...
NPeg appears to split such top-level choices into separate states.
Could you share your (failing) grammar?
Sure. The code is here: https://github.com/Varriount/squint
The sample.nim
file is the main entrypoint. I've only been able to compile it by removing the computedGoto pragma, and using the following command for compilation:
nim c --run --maxLoopIterationsVM:100000000 -d:npegInlineMaxLen:5 .\sample.nim
Also, for some reason adding -d:npegTrace
causes the resulting executable to immediately crash, with no output.
Dude. really. You made a complete SQL parser in NPeg? The agony!
Well, I'm actually impressed with NPeg to see it can handle the grammar and compilation does not take three hours. The bad news is that your grammar is indeed just too large and Nim chokes on the generated code - even when disabling inlining with -d:npegInlineMaxLen=0
your grammar still adds up to 14260 NPeg IR instructions, which is just too much for the computed goto.
The good news is that when I disable the explicit .computedGoto.
pragma from the npeg codegen, it compiles just fine (although it takes some time, the generated C file is over 4KLOC).
I've prepared a branch that emits a normal case
(without the .computedGoto.
) for large grammers, let me know if this works or you.
Branch name: no-computed-goto-for-large-grammars
The grammar is mostly translated from PostgreSQL's own grammar files - it did take some time to automate the translation and coax NPeg into performing the correct comparisons, but not as much as writing it all from scratch would have taken.
Regarding the rules, was my initial guess correct? Does NPeg split top-level choices into multiple states?
Right - you generated it, nice. I was already starting to question your sanity :)
About your ordered choice question: NPeg splits everything into separate states. Basically, the grammar gets translated to IR code (adding up to about 17k instructions for your grammar), where each IR instruction gets represented by a distinct state. So this is not specific to the ordered choice, the same happens for all other operators.
I'd be happy to merge the code for not emitting a computed goto for large grammars; the computed goto is only an /optimization/, it does not make the code more correct.
0.25.0 has been released, omitting computedGoto
for grammars with more then 10k instructions.
For very large grammars (such as SQL) NPeg's code generation mechanism will fail:
(another problem here is that the error message doesn't indicate what part of the grammar is the actual cause)