performance comparison vs std/re, std/nre

timotheecour commented 4 years ago

It would be really nice to have a benchmark comparing performance wrt std/re, std/nre; I haven't seen any. This would also be useful for performance regression testing. This could start small and become more comprehensive as needed.

links

https://forum.nim-lang.org/t/6008#37206

nitely commented 4 years ago

IIRC it's 40 times slower. Someone reported it to be 5 times slower. It really depends on the regex, PCRE has exponential time complexity, and while those cases are kind of rare, it's easy to build a regex that runs in quadratic time (and/or does a lot of backtracking), so nim-regex is likely faster in such cases.

I'm working on making it faster (as nregex when there are capture groups, i.e: 4 times slower than PCRE), but it'll take me a little while to get it done.

timotheecour commented 4 years ago

oh yikes; maybe 1st step to improve this would be adding some benchmark code somewhere and keep track of performance... if you have already one maybe link it here?

I suggest using this: https://github.com/mariomka/regex-benchmark

any ideas where the bottlnecks, and whether this is an implementation issue or if the design itself is the limiting factor compared to PCRE?

D's std.regex

I read a few times that D has/had the fastest regex engine (https://dlang.org/library/std/regex.html), but I can't find a recent benchmark on this: https://forum.dlang.org/post/mailman.775.1572279791.8294.digitalmars-d@puremagic.com

It's kinda sad because std.regex used to be touted as the fastest regex engine even compared to other regex libraries. But it seems others have caught up since while we've stagnated. :-/

it seems a bit in contradiction with numbers from https://github.com/mariomka/regex-benchmark ; /cc @DmitryOlshansky do you have any insights / links?

If it is one of the fastest, may be good to look into its design given some similarities that D and nim have (in particular, both being able to do CTFE, which is relevant for this discussion, since std.regex uses that for bytecode (relevant even if we're in the end talking about running regex at RT))

here's his (dated) blog post: https://dlang.org/blog/2016/11/07/big-performance-improvement-for-std-regex/ which has interesting insights

alternative approach: better PCRE wrapper

I'm wondering whether we can as an alternative wrap PCRE but using a better API than nim's std/re, std/nre, which have a number of issues. as for running regex at CT, I tried re and nre with -d:nimHasLibFFI; it "almost works" but needs small (maybe) language change to relax when casting in VM is allowed; so we could have high performance using both RT and CT

nitely commented 4 years ago

any ideas where the bottlnecks, and whether this is an implementation issue or if the design itself is the limiting factor compared to PCRE?

It's a design issue, AFAIK. nim-regex is based on Thompson's construction and multiple-state simulation, as described by Russ Cox [3]. Same as RE2, golang's regex, and rust-regex, albeit they have some optimizations to avoid falling back to the slow NFA engine.

Can a non-backtracking NFA based engine consistently (as in for every regex) reach PCRE performance? ¯\_(ツ)_/¯. I've never seen it happen. Thompson's algorithms don't come close, I can tell you that much.

I plan to improve the design and add some RE2 optimizations (the on-the-fly bounded-DFA construction at least) to make nim-regex faster than it's now.

I read a few times that D has/had the fastest regex engine (https://dlang.org/library/std/regex.html), but I can't find a recent benchmark on this: https://forum.dlang.org/post/mailman.775.1572279791.8294.digitalmars-d@puremagic.com

Nim has the fastest regex engine that I know of [4], at least for expressions without capture groups :p, which surprisingly covers a lot of cases: regex validation, search/find boundaries, search&replace, etc. But DFAs have their own drawbacks (i.e: exponential construction/compilation blow-up in rare cases).

If it is one of the fastest, may be good to look into its design given some similarities that D and nim have (in particular, both being able to do CTFE, which is relevant for this discussion, since std.regex uses that for bytecode (relevant even if we're in the end talking about running regex at RT))

here's his (dated) blog post: https://dlang.org/blog/2016/11/07/big-performance-improvement-for-std-regex/ which has interesting insights

That's a good read, but Bit-Parallel NFA Simulation (Glushkov's NFA simulation) is not something new. While it's faster than Thompson's simulation, it's not faster than a backtracking JITted simulation. Also, It's not a general solution (i.e: it only supports NFAs smaller than WORD-size), I'd guess D uses it to find prefixes (i.e: it only uses the first WORD-size parallel states of the NFA).

The state of NFA based engines boils down to a bunch of speed-ups on top of a slow algorithm. NFA is a dead end, performance wise. D's regex, same as RE2 and rust-regex, are based on an NFA engine. They put some other engines on top that work in some cases depending on the regular expression, but they fallback to the dog slow NFA engine otherwise.

alternative approach: better PCRE wrapper

There are too many horror stories of using backtracking regex engines (such as PCRE) in production [0] [1] [2], though, and they are not necessarily fast, just fast in some cases. For example, the regexes in mariomka/regex-benchmark will have limited backtracking, and depending on the input, no backtracking at all, so PCRE looks good in those benchmarks.

timotheecour commented 4 years ago

if the main problem for PCRE is good average performance but exponential worst case performance, can't a hybrid algorithm be used (at the expense of code complexity, granted)? That strategy worked out well in a different context here https://github.com/nim-lang/Nim/pull/13440, but it's also a common strategy, eg introsort uses same trick https://en.wikipedia.org/wiki/Introsort

begins with quicksort, it switches to heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted and it switches to insertion sort when the number of elements is below some threshold. This combines the good parts of the three algorithms [...]

nitely commented 4 years ago

Sure, but PCRE is fast because it has a JIT compiler, and a lot of work went into optimizing both the compiler and the backtracking algo. idk whether it's feasible to replicate the work.

It'd be interesting to know why PCRE doesn't use some heuristic to switch to a linear time algorithm in the exponential time cases.

nitely commented 4 years ago

I think the NFA/DFA hybrid approach (what RE2 and rust-regex do) would yield much better results, and make nim-regex fast enough. It's also easier to implement, so we'll see.

timotheecour commented 4 years ago

idk whether it's feasible to replicate the work

i meant wrap, not replicate here:

have a fast regex engine at runtime via reusing PCRE; with a strategy to switch to another algorithm (eg the one in nim-regex for eg) as fallback depth search is too high
expose it via a good API (maybe simply nim-regex's current API)
for running all this at CT (which would use importc pcre), there are 2 options:
- -d:nimHasLibFFI
- using vmops
- or giving up on PCRE-level performance and using current nim-regex, which is fine for CT work
for js target I guess this wouldn't work, but fallback to current nim-regex is good enough

nitely / nim-regex