nubank / matcher-combinators

Library for creating matcher combinator to compare nested data structures
Other
467 stars 23 forks source link

This defines simple equality fastpaths for order agnostic matchers #166

Closed aredington closed 2 years ago

aredington commented 2 years ago

This does not match prior semantics like-for-like and we should definitely discuss this.

e.g.: (matcher-combinators.standalone/match (matcher-combinators.matchers/set-equals [:a :a :b :c]) #{:a :b :c}) returns {:match/result :mismatch, :mismatch/detail #{{:expected :a} :c :b :a}} when invoked prior to these fastpaths. Now it returns {:match/result :match} I believe this is a bug in set-equals

The cardinality of the sets in the pathological case tests are sufficient to provoke bad performance running on my dev laptop; reproducing bad performance can be simply examined by commenting the body of the fast path functions (which will then return nil, which will then not provide a fast path success, which will result in the O(n!) behavior we observe today)

Encouragingly, set-equals was not interminably slow, just ~1.5-2 minutes without the fast path :)

philomates commented 2 years ago

Cool idea!

To address the weirdness of (matcher-combinators.standalone/match (matcher-combinators.matchers/set-equals [:a :a :b :c]) #{:a :b :c}) one could consider having it check if the expected list is longer than the actual value provided and having the mismatch info focus on that. It deviates from the current norm but is probably more useful and makes more sense.

note also that the behavior of predicates will change with this fast path

(standalone/match (m/in-any-order [odd? even?])
                  [even? odd?])

fails currently with

Syntax error (IllegalArgumentException) compiling at (matcher_combinators/standalone_test.clj:16
:1).
Argument must be an integer: clojure.core$even_QMARK_@65d89847

but will pass with this patch

philomates commented 2 years ago

you could skip the fast path in the presence of functions but one would also have to find a way to make this play nice with overriding default matching behavior with match-with. Like what happens if you apply this fast-path when you use match-with to treat numbers as close-but-not-exactly-the-same (not that anyone would do that, but still demonstrates how behavior could deviate)

aredington commented 2 years ago

@philomates Could you help me understand the use case where set-equals is provided a submatcher multiple times to evaluate against the actual set?

philomates commented 2 years ago

@philomates Could you help me understand the use case where set-equals is provided a submatcher multiple times to evaluate against the actual set?

I don't know if I completely follow, like are you wondering why set-equals was introduced? I think I added it because I once wanted to do something like (match? #{odd? odd} #{1 3}) and realized it evaluated to (match? #{odd?} #{1 3}). That said it is probably rarely needed or used

philomates commented 2 years ago

Another idea I've pondered to speed up the O(n!) issue in matching is to introduce new matchers (like quick-in-any-order) that don't try to find the smallest mismatch, but are fine with just determining that a match doesn't exist and returning any mismatch. If I remember correctly the change I did to find the smallest possible mismatch really added another factor of runtime complexity here. This is getting into API bloat / power-user territory so it probably isn't a good idea, but is the best I could think of for this

aredington commented 2 years ago

Yeah I don't think we should be worrying about optimizing the performance of reporting failure. If it takes us O(n!) to generate a usefully read error message, I think that may be fine. But if we can determine "This test passes" in O(n log n) at the expense of not generating that traceability, most of our tests are passing most of the time, and that's a big win.

aredington commented 2 years ago

Failures are due to CLJS / JS incoherence:

parser.cljc gives java.lang.Object -matcher-for impls, but parser.cljc gives no types implementations of -matcher-for.

If we have a reliable means of determining:

Then we can proceed in this direction (-matcher-for is presently the bad impl of bullet 2).

aredington commented 2 years ago

Having given this a few days thought the approach in the last few commits is likely a dead end.

A map defaults to matcher-combinators.matchers/embeds; so any maps will prevent this fastpath from running.

Don't have a strong sense "how" yet, but we need to make it so that e.g.

(matchers/match-with [map? matchers/equals] 
  (matchers/in-any-order ... ))

does the right thing.