Open no-defun-allowed opened 3 years ago
bbfa2cf provides an object for prefix code generation, but it is "bare" and the heuristics it uses are implemented using inheritance and call-next-method
instead of a compiler object making decisions.
This has sort of been done now, but there is no public interface yet. I have to think about how to expose everything in the right way.
one-more-re-nightmare splits a regular expression into a prefix string and the rest of the RE. The prefix is scanned for more efficiently using Boyer-Moore-Horspool, before entering the DFA body generated for the rest of the RE. (This technique is also used in cl-ppcre and GNU grep to my knowledge.)
However, it is possible that BMH is not ideal for processing prefixes. Modern processors have vector arithmetic units which can perform many element comparisons at once, and it is not too hard to design a substring search which uses vectorised code. When combined with heuristics based on properties known about the vector to search, vectorised searching can be faster than clever serial searching algorithms.
It should be possible to allow the client to generate their own prefix scanning code. The interface to the BMH code generator provides most of the required information to generate code, with the lambda list
(start length vector prefix aref-generator fail)
. The first three arguments are symbols naming internal variables o-m-r-n uses, then the prefix sequence, then the:aref-generator
argument tocompile-regular-expression
, then some code to evaluate when there are no more matches.