Closed windelbouwman closed 5 years ago
I'm well aware of regex derivatives and have read that paper several times.
If you want to use regex derivatives, then I suggest you take on the research task of figuring out how to productionize it. You claim it is "better" than the most commonly used ones, but what practical experience do you have backing that up?
Regex derivatives are a nice theoretical result, and they might even be useful in certain limited circumstances where you absolutely need the full DFA. But regex engines aren't just about reporting whether a match exists or not.
but seems to be forgotten about
Trust me, among folks writing regex engines, it is not forgotten about.
I apologize if my tone appears harsh, but your issue request is basically, "can someone spend months of their time researching and productionizing this alternative regex construction algorithm even though its benefits over the existing approach aren't clear?"
I'm not sure what algorithm you used in this regex crate for DFA construction
It uses a lazy DFA, which protects against exponential state blow up. Something that regex derivatives cannot do (or at least, nobody has shown how).
I apologize if my tone appears harsh, but your issue request is basically, "can someone spend months of their time researching and productionizing this alternative regex construction algorithm even though its benefits over the existing approach aren't clear?"
No problem! My request was not for someone to spend months of their time on this, I just wanted to check if this algorithm was considered / known. It is good to know that it was considered, thank you for your answer!
Hi all,
I'm not sure what algorithm you used in this regex crate for DFA construction, but you might be interested in this paper: https://www.ccs.neu.edu/home/turon/re-deriv.pdf
The basic idea is to use regular expression derivatives to construct a DFA with a minimum number of states. The algorithm better than the most commonly used ones, but seems to be forgotten about. There is an implementation of this written in python: https://github.com/michaelpaddon/epsilon
Would it be possible to implement several different algorithms next to each other and make it configurable which one to use?
Regards, Windel