Closed timbray closed 1 month ago
Attention: Patch coverage is 95.23810%
with 9 lines
in your changes missing coverage. Please review.
Project coverage is 96.10%. Comparing base (
5eee82d
) to head (37b33ab
).
Files | Patch % | Lines |
---|---|---|
prettyprinter.go | 70.58% | 4 Missing and 1 partial :warning: |
nfa.go | 94.00% | 2 Missing and 1 partial :warning: |
core_matcher.go | 92.30% | 1 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Oh drat, I somehow missed go test -v -race -count=1
looks like a real race condition. Please ignore this PR for now.
You may have noticed a bit of thrashing in this PR
There's an important trade-off to consider.
To summarize, #314 cleanly kills issue #188 which means that you can freely use "*" globs, as many as you want, and you won't get an O(2**N) explosion in your memory size and addPattern()
latency. I'll also add the ability to have multiple wildcards in a single pattern and that will be fine too. Now we have a clear path to doing full regular expressions in Quamina.
However, there is a cost. #314 adds "epsilon transitions" and multi-state traversal to our automata, which solves a lot of problems but has a performance cost. The latest versions, compared to the current main
branch on GitHub, is 20% slower on exact-match, 15% slower on prefix, and 10% slower on anything-but. (The most recent push is much worse but I've fixed that in my version, back to around 20%). I care about performance a lot, this is bad.
On the other hand:
Question for the audience: Should we go ahead and accept the performance penalty in exchange for opening the gates to lots of nice features? Is there anyone now using Quamina who's going to be troubled by a ~20% performance hit?
That performance hit won't cause me any heartburn.
Does this work help at all with a future numeric
implementation?
Does this work help at all with a future
numeric
implementation?
I'm not sure. Because of the way smallTables work, I don't think there any need for nondeterministic automata to do that. Which in fact we managed to do with Ruler back in the day.
Hearing no objections… Once again, I acknowledge that potential reviewers are probably not eager about digging through finite-automata theory, so unless someone screams STOP I'll push this Monday.
prevent state explosions with epsilon transitions
The introduction of proper NFA epsilon transitions - see https://swtch.com/~rsc/regexp/regexp1.html - totally fixed this problem. The most dramatic illustration is line 96 of shell_style_test.go. Previously, you could only use 20 or so shell_style patterns before an O(2**N) explosion leading to millions of states. Now you can add all 12K in the test file. The
addPattern()
runs almost instantly, although matching slows down, a fair trade-off.Once this is landed, we can start implementing all the cool stuff that was-ruler has without worry about exploding NFAs.