telekons / one-more-re-nightmare

A fast regular expression compiler in Common Lisp
https://applied-langua.ge/projects/one-more-re-nightmare/
BSD 2-Clause "Simplified" License
138 stars 9 forks source link

Tiered compilation #6

Open no-defun-allowed opened 3 years ago

no-defun-allowed commented 3 years ago

The compiler is a bit slow. It's faster than compilers for many programming languages, but it's weird for a funny-looking regular expression to take 150ms or so to compile. I've done my part in optimizing DFA generation, so most compile-time is taken by the Common Lisp compiler. On the other hand, the compiler also generates really good code - I wouldn't be surprised if most projects could go without full compilation. While we could sink DFA compilation into compile time when the regular expression is known at compile time, we do provide a function which conceptually accepts a string at runtime, and so it would be nice to do something useful if we need to compile at runtime.

So here is an idea: we do something more like JIT compilation and have a tiered compiler. The current compiler is used for "hot" regexes which are used to scan lots of vectors, and we use a chain of closures for cold code. Some heuristic would be used to make the switch to hot code, such as if we've matched more than some length with a cached chain of closures, and the optimizing compiler could run concurrently in another thread, too.

The chain of closures would take some of the benefits of DFA generation: we know how many registers are needed (though, without a compiler with copy propagation we overestimate substantially; oh well) and we still have linear complexity while scanning. We also could still use type splitting for chains of closures. All in all I wouldn't be too surprised if the cold compiler was still faster than cl-ppcre and other bytecode/closure-based implementations.