Open ehildenb opened 3 years ago
I'm going to pick up this task. It's probably going to have to involve a complete rewrite of the parser kernel used for inner parsing. I won't merge anything that's not at least as efficient as the current approach however.
To avoid the complete rewrite of the parser kernel, we could just avoid the problem in the grammar generation phase.
Instead of sending syntax Exps ::= Exps "," Exp
rewrite it to syntax Exps ::= Exp "," Exps
and have a post processing step that rearranges the elements in the list.
I postponed this task though because the type inferencer influence on parsing time was far greater than the improvement to parsing here. Try to see if it has an actual benefit in doing this change.
It's greater in most cases, but the worst case time is still quite significant for some rules. The goal here isn't to make the overall time better in most cases, it's to guard against pathological rules taking a really long time without making the implementation any slower. It also will allow us to not have to put parentheses around really long associative lists, which is a trap that has ensnared people writing proofs in K many times.
I am not convinced that the solution you mention will really work with arbitrary context free grammars. I would prefer a solution that is not a hack, which means properly handling priority and associativity inside the kernel itself. Besides, the current parser is a truly awful implementation of Earley and it's not necessarily a bad thing to rewrite it. I doubt it would even take terribly long.
Some updates after the parser kernel rewrite:
$ kompile --version
K version: v5.4.52-0-gfa56d11bcc
syntax Exps ::= Exps Int // fast
| Int
$ seq 0 1 10 | kast - -v --output none // Total = 0.465s
$ seq 0 1 100 | kast - -v --output none // Total = 0.511s
$ seq 0 1 1000 | kast - -v --output none // Total = 0.654s
$ seq 0 1 10000 | kast - -v --output none // Total = 3.645s
$ seq 0 1 100000 | kast - -v --output none // OutOfMemoryError
syntax Exps ::= Int Exps // slow
| Int
$ seq 0 1 10 | kast - -v --output none // Total = 0.462s
$ seq 0 1 100 | kast - -v --output none // Total = 0.503s
$ seq 0 1 1000 | kast - -v --output none // Total = 1.193s
$ seq 0 1 10000 | kast - -v --output none // OutOfMemoryError
$ seq 0 1 2000 | kast - -v --output none // Total = 2.321s
$ seq 0 1 3000 | kast - -v --output none // Total = 3.917s
$ seq 0 1 4000 | kast - -v --output none // OutOfMemoryError
Let's try to understand why we need to put so many parens around associative lists. This is annoying, and not doing it is a huge slowdown.