timbray / quamina

Home of Quamina, a fast pattern-matching library in Go
Apache License 2.0
373 stars 18 forks source link

pat: Correct number matching (again) #330

Closed timbray closed 2 weeks ago

timbray commented 2 weeks ago

Thanks to @jsmorph for concurrency_test.go, whose surprising result revealed that the work for this PR was not nearly finished. More work was required on (a) detecting whether a number in an incoming event could be made comparable, (b) communicating that fact to the matching code, (c) marking NFAs which contained comparable numbers, leading to the following statement in value_matcher.go:

        if vmFields.hasQNumbers && eventField.IsQNumber {
            qNumber, err := qNumFromBytes(val)

Also, (d) the unit tests needed work to cover the cases that were previously issued.

Now, there is still a performance cost, particularly in benchmarks that do a lot of matching of numbers; correctness isn't free. But at the moment, I think the trade-off is acceptable.

Note that having previously called these things comparableNumbers and canonicalNumbers and disliking both and getting long ugly variable names, I renamed them to Q numbers, where Q is for Quamina.

codecov-commenter commented 2 weeks ago

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 99.31034% with 1 line in your changes missing coverage. Please review.

Project coverage is 96.73%. Comparing base (b1c32f5) to head (c1b627e).

Files Patch % Lines
numbers.go 97.87% 1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #330 +/- ## ========================================== + Coverage 96.48% 96.73% +0.24% ========================================== Files 18 18 Lines 1764 1837 +73 ========================================== + Hits 1702 1777 +75 + Misses 35 34 -1 + Partials 27 26 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

timbray commented 2 weeks ago

tl:dr: I'm OK with the slowdown reported here.

As remarked, the numeric processing cost is real. This benchmark that shows this is using the pattern:

{ "context": { "user_id": [9034], "friends_count": [158] } }

Which is purely numeric. So, instead of matching 3- and 4-byte numbers, we're matching 14-byte canonical representations. Interestingly, when I measure matches/second metrics, the slowdown is dramatically smaller than that reported by the GitHub CI benchmark runner. (Not sure why.)

Finally, for the particular type of NFA produced for number atching, there are several obvious optimizations in the way the NFA-matcher runs, which will further reduce the impact of numeric accuracy. I think accuracy is important, so that an event like the following would match the pattern above.

{ "context": { "user_id": 9034, "friends_count": 1.58e2 }} 

(or 158.0 or any other variant)