Closed masklinn closed 6 months ago
Right so did a quick POC, helped a lot by the fact that the python bindings for re2
expose _Regexp
and _Match
objects which are exact matches for the stdlib, so converting the regex matching / data extraction from re
to re2
was just a matter of copying some code from the matchers and doing some replacements.
However somewhat shockingly re2's regexps are nearly 100% slower than re's. re2's main claim to fame is to work in linear time and the doc specifically notes that optimistic backtracking engines will often be faster, but it's still surprising to have backtracking win out, by that much, even though re2 had already precompiled all the regexes and stuff. I guess it's possible the regex of uap-core are pretty much designed for backtracking engines, or at least have never really taken non-backtracking engines in consideration.
Here's the bench summary without and with, the -re
columns are the original using re
for data extraction, while the ones without use re2 for data extraction
re2-re | re2-lru-re | re2-s3fifo-re | re2-sieve-re | re2 | re2-lru | re2-s3fifo | re2-sieve | |
---|---|---|---|---|---|---|---|---|
10 | 33 | 33 | 32 | 33 | 59 | 62 | 62 | 60 |
20 | 33 | 32 | 31 | 31 | 59 | 62 | 61 | 60 |
50 | 33 | 32 | 30 | 29 | 59 | 60 | 57 | 57 |
100 | 33 | 31 | 27 | 27 | 59 | 60 | 52 | 53 |
200 | 33 | 29 | 25 | 25 | 59 | 56 | 49 | 48 |
500 | 33 | 25 | 22 | 21 | 59 | 49 | 41 | 41 |
1000 | 33 | 22 | 19 | 19 | 59 | 42 | 37 | 36 |
2000 | 33 | 18 | 16 | 16 | 59 | 35 | 29 | 30 |
5000 | 33 | 15 | 14 | 13 | 59 | 27 | 25 | 25 |
As it turns out
re2.Filter
compiles the patterns individually internally, and provides access to those. This means it ought be possible to extract helpers or adapt its match object in order to extract / substitute from the re2 match objects and never have to compile/matchre
regexes.It's not like the
re2
resolver needs to further stomp the basic resolver, but it does not hurt.