Closed SquidDev closed 1 year ago
Nice! I experimented with various ways to compact the DFA a long time ago, I tried DFA minimization but in the end I only kept the compact representation of transitions (explained here: https://def.lakaban.net/posts/2022-04-14-even-more-compact-lexer-table/ , not sure if this is common knowledge? I didn't find any article describing this).
Then I moved my focus to the more pressing issue of the DFA construction :) (reduction simulation was full of bug at that time... I hope is a bit more robust know :P).
Performance should not be a problem: I implemented Valmari minimization algorithm at that time (https://www.cs.cmu.edu/~cdm/papers/Valmari12.pdf), a variant of Hopcroft that is faster for large alphabets (afaik, when fixing other factors, Hopcroft complexity grows linearly wrt the size of the alphabet, Valmari grows logarithmically). Early tests indicate a reasonable speed-up, though I am not confident in my implementation :-)
I realise this project is on the back-burner right now, so happy if you want to close this!
Well, since mid-January I am focusing on this project :). I spent three weeks trying to understand what I call the "error space", the languages of stack that fails. They have a regular structure, so we can enumerate them. But the automaton is way too complex (for OCaml language) for a human to grok, so I am looking at other ways to make sense of it. The goal is to improve the tooling that checks for coverage and redundancy.
I will share my roadmap soon to keep you posted on what will come next!
(And actually 400ms is fine, the benefits of a smaller automaton outweigh this cost).
I experimented with various ways to compact the DFA a long time ago, I tried DFA minimization but in the end I only kept the compact representation of transitions (explained here: https://def.lakaban.net/posts/2022-04-14-even-more-compact-lexer-table/ , not sure if this is common knowledge? I didn't find any article describing this).
I must confess, most of my knowledge here comes from what I remember from university, so is definitely a little spartan! That (and your other article on lexer tables) was very helpful, thank you!
I implemented Valmari minimization algorithm at that time (https://www.cs.cmu.edu/~cdm/papers/Valmari12.pdf), a variant of Hopcroft that is faster for large alphabets
Oh, this sounds ideal. I did try to keep transitions using sets of input symbols, but that's still going to behave badly in the worst case.
Definitely echo your concerns about the correctness of implementation though. Just encountered a issue with this patch, where states with non-identical halting sets are considered equivalent. I think this is easily fixable (take the union of halting sets for this equivalence class as the new halting set), but now much less confident with the rest of my code!
I will share my roadmap soon to keep you posted on what will come next!
Thank you! I realise everything is still in the research stage, but excited to see this develop! It's been such a pleasure to use :).
I am closing the PR, the automaton is now minimized in master. Thanks for your help.
This runs Hopcroft's DFA minimisation algorithm on Lrgrep_support's DFA before generating generating the state machine. In the case of the OCaml example, this reduces the number of states from 2854 to 440 (with a corresponding reduction in file size).
While I've been using this on my local fork without issue, it's probably still a long way from production ready. It's quite slow (~400ms to run on the OCaml example), though not quite sure how to improve that.
I realise this project is on the back-burner right now, so happy if you want to close this!