nubank / matcher-combinators

Library for creating matcher combinator to compare nested data structures
Other
467 stars 23 forks source link

recover after missing/extra element in sequence #187

Closed philomates closed 6 months ago

philomates commented 1 year ago

addresses https://github.com/nubank/matcher-combinators/issues/177

before:

expected: (match? [1 2 3 4] [1 3 4])
  actual: [1 (mismatch (expected 2) (actual 3)) (mismatch (expected 3) (actual 4)) (missing 4)]

after:

expected: (match? [1 2 3 4] [1 3 4])
actual: [1 (expected 2) 3 4]
lread commented 1 year ago

Super cool @philomates!

I've played with this PR in a scratch project to get a feel for it when used in tests. I tried extra/missing leading/trailing, and nested. It all seems to hang together very well (with one minor comment to follow).

The terminology of missing and unexpected work very nicely and clearly conveys what is awry.

Here's my little playground:

(ns mc-recover.mc-recover-test
  (:require [clojure.test :as t :refer [deftest is testing]]
            [matcher-combinators.test]))

(deftest mc-recover
  ;; playing around with matcher-combinators seq recovery
  ;; very cool!
  ;; all assertions will fail.
  (is (match? [1 2] [1 2 3 4]) "extra trailing")
  (is (match? [1 2] [9 0 1 2]) "extra leading")
  (is (match? [1 2] [1 98 99 2]) "extra internal")
  (is (match? [1 2] [9 8 1 98 99 2 3 4]) "extra leading, internal, trailing")
  (is (match? [] [1 2 3]) "all extra")

  (is (match? [1 2 3] [1 2]) "missing trailing")
  (is (match? [1 2 3] [2 3]) "mising leading")
  (is (match? [1 99 2] [1 2]) "missing internal")
  (is (match? [0 1 2 99 98 97 3 4 5] [2 99 3]) "missing leading, internal, trailing")
  (is (match? [1 2 3] []) "all missing")

  (is (match? [1 82 3 4 2 7 9 10] [1 2 10 11]) "recover multiple mismatches")

  (is (match? [1 [2 3 4] 5] [1 [2 3] 5]) "nested mismatch")

  (is (match? [1 2 3] (list 1 3)) "mismatch on different seq types")

  (is (match? [:x :y [:a :b :c] :z] [:x [:a :b :c]]) "recovery works on nested element")

  (is (match? [#"some.*" :y 3 #"other.*" 5] ["somestring" :y 4 "nope" 5]) "other matchers still work")

  (is (match? {:a {:b [3 4 5]}} {:a {:b [3 4 5 6]}}) "nested mismatch in a map"))

And the results it produced with the cognitect test runner:

$ clojure -M:test
Running tests in #{"test"}

Testing mc-recover.mc-recover-test

FAIL in (mc-recover) (mc_recover_test.clj:9)
extra trailing
expected: (match? [1 2] [1 2 3 4])
  actual: [1 2 (unexpected 3) (unexpected 4)]

FAIL in (mc-recover) (mc_recover_test.clj:10)
extra leading
expected: (match? [1 2] [9 0 1 2])
  actual: [(unexpected 9) (unexpected 0) 1 2]

FAIL in (mc-recover) (mc_recover_test.clj:11)
extra internal
expected: (match? [1 2] [1 98 99 2])
  actual: [1 (unexpected 98) (unexpected 99) 2]

FAIL in (mc-recover) (mc_recover_test.clj:12)
extra leading, internal, trailing
expected: (match? [1 2] [9 8 1 98 99 2 3 4])
  actual: [(unexpected 9)
 (unexpected 8)
 1
 (unexpected 98)
 (unexpected 99)
 2
 (unexpected 3)
 (unexpected 4)]

FAIL in (mc-recover) (mc_recover_test.clj:13)
all extra
expected: (match? [] [1 2 3])
  actual: [(unexpected 1)
 (unexpected 2)
 (unexpected 3)]

FAIL in (mc-recover) (mc_recover_test.clj:15)
missing trailing
expected: (match? [1 2 3] [1 2])
  actual: [1 2 (missing 3)]

FAIL in (mc-recover) (mc_recover_test.clj:16)
mising leading
expected: (match? [1 2 3] [2 3])
  actual: [(missing 1) 2 3]

FAIL in (mc-recover) (mc_recover_test.clj:17)
missing internal
expected: (match? [1 99 2] [1 2])
  actual: [1 (missing 99) 2]

FAIL in (mc-recover) (mc_recover_test.clj:18)
missing leading, internal, trailing
expected: (match? [0 1 2 99 98 97 3 4 5] [2 99 3])
  actual: [(missing 0)
 (missing 1)
 2
 99
 (missing 98)
 (missing 97)
 3
 (missing 4)
 (missing 5)]

FAIL in (mc-recover) (mc_recover_test.clj:19)
all missing
expected: (match? [1 2 3] [])
  actual: [(missing 1) (missing 2) (missing 3)]

FAIL in (mc-recover) (mc_recover_test.clj:21)
recover multiple mismatches
expected: (match? [1 82 3 4 2 7 9 10] [1 2 10 11])
  actual: [1
 (missing 82)
 (missing 3)
 (missing 4)
 2
 (missing 7)
 (mismatch (expected 9) (actual 10))
 (mismatch (expected 10) (actual 11))]

FAIL in (mc-recover) (mc_recover_test.clj:23)
nested mismatch
expected: (match? [1 [2 3 4] 5] [1 [2 3] 5])
  actual: [1 [2 3 (missing 4)] 5]

FAIL in (mc-recover) (mc_recover_test.clj:25)
mismatch on different seq types
expected: (match? [1 2 3] (list 1 3))
  actual: (1 (missing 2) 3)

FAIL in (mc-recover) (mc_recover_test.clj:27)
recovery works on nested element
expected: (match? [:x :y [:a :b :c] :z] [:x [:a :b :c]])
  actual: [:x (missing :y) [:a :b :c] (missing :z)]

FAIL in (mc-recover) (mc_recover_test.clj:29)
other matchers still work
expected: (match? [#"some.*" :y 3 #"other.*" 5] ["somestring" :y 4 "nope" 5])
  actual: ["somestring"
 :y
 (mismatch (expected 3) (actual 4))
 (mismatch (expected #"other.*") (actual "nope"))
 5]

FAIL in (mc-recover) (mc_recover_test.clj:31)
nested mismatch in a map
expected: (match? {:a {:b [3 4 5]}} {:a {:b [3 4 5 6]}})
  actual: {:a {:b [3 4 5 (unexpected 6)]}}

Ran 1 tests containing 16 assertions.
16 failures, 0 errors.

The only case that surprised me slightly was this one:

FAIL in (mc-recover) (mc_recover_test.clj:21)
recover multiple mismatches
expected: (match? [1 82 3 4 2 7 9 10] [1 2 10 11])
  actual: [1
 (missing 82)
 (missing 3)
 (missing 4)
 2
 (missing 7)
 (mismatch (expected 9) (actual 10))
 (mismatch (expected 10) (actual 11))]

I was thinking that maybe we'd get a missing for 9, a match for 10, and an unexpected for 11. But maybe this is not practical? Dunno. What you've come up with is already a vast improvement over the existing diffs for sequences.

The formatting of actual, when actual is multiple lines is, I'm pretty sure, a pre-existing and separate issue.

lread commented 1 year ago

For comparison, that last example when diffed with deep-diff2:

❯ clj -Sdeps '{:deps {lambdaisland/deep-diff2 {:mvn/version "2.7.169"}}}'

Clojure 1.11.1
user=> (require '[lambdaisland.deep-diff2 :as ddiff])
nil
user=> (ddiff/pretty-print (ddiff/diff [1 82 3 4 2 7 9 10] [1 2 10 11]))
[1 -82 -3 -4 2 -7 -9 10 +11]
nil

Here's a screenshot for the colouring: image

It seems to behave the way I would have expected, maybe we can do the same after all.

philomates commented 1 year ago

Thanks for the comprehensive testing @lread

for

(is (match? [1 82 3 4 2 7 9 10] [1 2 10 11]) "recover multiple mismatches")

the current behavior gives 6 mismatches and 2 matches, for a total length of 8 elements

[...
(missing 7)
(mismatch (expected 9) (actual 10))
(mismatch (expected 10) (actual 11))]

the deepdiff version gives

[... 
(missing 7)
(missing 9)
10
(unexpected 11)]

which has 3 matches and 6 mismatches, for a total length of 9 elements. So it sort of depends on how you want to weigh matches vs mismatches vs total sequence size. I don't think it would be too difficult to adapt the behavior to be like deepdiff but I'm not sure yet if it is necessarily the right thing

lread commented 1 year ago

Yeah, good point, there there is no absolute right or wrong.

I prefer a diff which makes some attempt to re-sync after a mismatch.

But that might be my subjective preference.

It could be interesting to learn more about deep-diff2's rationale and intent with regards to re-sync. Deep-diff2 uses clj-diff, which describes its diffing algorithm use here.

I don't know much about diff algorithms, but I think an "optimal" one produces the smallest number of diffs, which seems related.

dchelimsky commented 1 year ago

I think that (missing 9) and (unexpected 11) is easier to grok. @philomates you said

I don't think it would be too difficult to adapt the behavior to be like deepdiff

We could

I'm happy with any of this, as eve without the deepdiff alignment, this is a lot better as/is. I think the best option is the last one, if you're good with that.

philomates commented 1 year ago

I'm fine with merging as is and doing a follow-up to improve the results to better match deepdiff. any thoughts on the performance considerations? https://github.com/nubank/matcher-combinators/pull/187#discussion_r1034627442