nubank / matcher-combinators

Library for creating matcher combinator to compare nested data structures
Other
467 stars 23 forks source link

performance issues (?) simple test takes 13 seconds to execute #221

Open jasonjckn opened 8 months ago

jasonjckn commented 8 months ago

Hopefully you can reproduce this, i have a very simple test taking 13 seconds to execute, here it is

we're on latest nubank/matcher-combinators {:mvn/version "3.9.1"}

(ns my-namespace
  (:require
   [matcher-combinators.test :refer [match? thrown-match?]]
   [matcher-combinators.matchers :refer [regex set-embeds]]
   [clojure.test :refer :all]))

(deftest A

  (time
    (is (match? #{{:e 17592186045796, :a 260, :v 17592186045797, :added? false}
                  {:e 17592186045794, :a 312, :v 17592186045799, :added? false}
                  {:e 17592186045794, :a 404, :v #inst "2024-03-26T00:49:40.964-00:00", :added? false}
                  {:e 17592186045799, :a 107, :v 17592186045787, :added? false}
                  {:e 17592186045794, :a 356, :v "TestName", :added? false}
                  {:e 17592186045799, :a 89, :v 17592186045583, :added? false}
                  {:e 17592186045799, :a 445, :v "the_name", :added? false}
                  {:e 17592186045794, :a 485, :v 17592186045763, :added? false}
                  {:e 17592186045796, :a 445, :v "Test", :added? false}
                  {:e 17592186045799, :a 396, :v 17592186045550, :added? false}}

                #{}))))
image
philomates commented 8 months ago

hey, thanks for raising this Matching sets or in-any-order has very bad performance due to the fact that the library tries every permutation in an attempt to find the smallest mismatch. There are workarounds (see https://github.com/nubank/matcher-combinators/issues/106#issuecomment-644034488). It might also be interesting to make this configurable; allowing for turning off the logic that searches for a minimal mismatch, and thus speeding things up. In the sort term it probably makes sense to document this slow behavior to help folks understand what is up.

If you'd like to speed things up you can use vectors instead of sets to not make use of the in-any-order matching nature of sets. As in, turn the (is (match? #{...} ...)) into a (is (match? [...] (into [] ...)). This will only work if the ordering of the actual is deterministic, or you sort before matching.

jasonjckn commented 8 months ago

@philomates

For my particular use case, i only needed to check equivalence, so i went with built-in (is (= seta setb)), but yah i'd encourage this project to find a solution, as you said, toggling that logic sounds good to me. I am also a heavy user of in-any-order, in other use cases.

I actually wouldn't mind using (is (set/subset? A B)), the only 'problem' is matcher-combinator has far more readable test failure output than built-in test functionality, part of that is the minimal mismatch, but even with that disabled, i still would rather have your test failure output.