timbray / quamina

Home of Quamina, a fast pattern-matching library in Go
Apache License 2.0
371 stars 17 forks source link

pat: Fix numeric matching #27

Open timbray opened 2 years ago

timbray commented 2 years ago

Quamina doesn't know that "30", "3.0e1", and "30.000" are the same number. We already have code to canonicalize numbers with up to 18 digits of precision in the range +/- 10**9 so they can be matched by a DFA, but canonicalizing is runtime-expensive and if you know that all the numbers in your events are integers, purely wasteful.

I think the best way to fix this is to introduce a "numeric" predicate into patterns:

{ "max": [ {"numeric": [ "=", 30] } ] }

But there might be a case for asking at the Match* API level to request canonicalizing numbers in incoming events.

timbray commented 2 years ago

In fact, the right thing to do here is to add the full suite of numeric pattern features from EventBridge.

embano1 commented 2 years ago

In fact, the right thing to do here is to add the full suite of numeric pattern features from EventBridge.

Agree. I'd argue any deviation from EB might cause trouble down the road bc I cannot simply reuse EB patterns in Quamina but would have to write custom parsers.

timbray commented 2 weeks ago

In fact, the right thing to do here is to add the full suite of numeric pattern features from EventBridge.

Agree. I'd argue any deviation from EB might cause trouble down the road bc I cannot simply reuse EB patterns in Quamina but would have to write custom parsers.

Now that I'm starting to think about this, I note that whenever aws-ruler sees a rule like

{ "x": [ 1, 1.5 ] }

It puts in matchers for both the exact strings 1 and 1.5, but also matchers for the canonicalized forms. So I think that Q has to do the same thing, for compatibility. Then at runtime, if there are any fields in the instance with the canonicalized numbers, whenever it sees a number in the incoming event, it canonicalizes that at match time. Which is quite expensive. And probably purely wasteful for 1 but not 1.5. So I think the right thing to do is do like Ruler but only for non-integer numeric values? Also, should look closely at the canonicalization code and see if it's efficient enough.

timbray commented 1 week ago

Note that implementing this (semantics are supposed to exactly match Ruler's) led to filing a Ruler issue: https://github.com/aws/event-ruler/issues/163