Closed timbray closed 2 weeks ago
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Attention: Patch coverage is 99.31034%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 96.73%. Comparing base (
b1c32f5
) to head (c1b627e
).
Files | Patch % | Lines |
---|---|---|
numbers.go | 97.87% | 1 Missing :warning: |
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
tl:dr: I'm OK with the slowdown reported here.
As remarked, the numeric processing cost is real. This benchmark that shows this is using the pattern:
{ "context": { "user_id": [9034], "friends_count": [158] } }
Which is purely numeric. So, instead of matching 3- and 4-byte numbers, we're matching 14-byte canonical representations. Interestingly, when I measure matches/second metrics, the slowdown is dramatically smaller than that reported by the GitHub CI benchmark runner. (Not sure why.)
Finally, for the particular type of NFA produced for number atching, there are several obvious optimizations in the way the NFA-matcher runs, which will further reduce the impact of numeric accuracy. I think accuracy is important, so that an event like the following would match the pattern above.
{ "context": { "user_id": 9034, "friends_count": 1.58e2 }}
(or 158.0
or any other variant)
Thanks to @jsmorph for concurrency_test.go, whose surprising result revealed that the work for this PR was not nearly finished. More work was required on (a) detecting whether a number in an incoming event could be made comparable, (b) communicating that fact to the matching code, (c) marking NFAs which contained comparable numbers, leading to the following statement in value_matcher.go:
Also, (d) the unit tests needed work to cover the cases that were previously issued.
Now, there is still a performance cost, particularly in benchmarks that do a lot of matching of numbers; correctness isn't free. But at the moment, I think the trade-off is acceptable.
Note that having previously called these things
comparableNumbers
andcanonicalNumbers
and disliking both and getting long ugly variable names, I renamed them to Q numbers, where Q is for Quamina.