nubank / matcher-combinators

Library for creating matcher combinator to compare nested data structures
Other
458 stars 21 forks source link

Consider recovering after missing/extra element in sequence #177

Open lread opened 1 year ago

lread commented 1 year ago

Thanks!

First thanks so much for this library, I've used it with great pleasure on some cljdoc tests and am now proposing its use on clj-kondo tests.

Observation

matcher-combinators behaviour

While exploring matcher-combinators for clj-kondo tests, I noticed when an item is missing from a sequence, matcher-combinators does not seem to attempt to recover after the missing element.

Given test:

(deftest lee-test                                                                                                                                                                                                                
  (is (match? [1 2 3 4] [1 3 4]))) 

The mismatch is reported as:

FAIL in (lee-test) (java_test.clj:70)
expected: (match? [1 2 3 4] [1 3 4])
  actual: [1
 (mismatch (expected 2) (actual 3))
 (mismatch (expected 3) (actual 4))
 (missing 4)]

Snapshot for colours: image

Same idea for an unexpected extra element in a sequence:

(deftest lee-test                                                                                                                                                                                                                
  (is (match? [1 3 4] [1 2 3 4]))) 

Gives us:

FAIL in (lee-test) (java_test.clj:71)
expected: (match? [1 3 4] [1 2 3 4])
  actual: [1
 (mismatch (expected 3) (actual 2))
 (mismatch (expected 4) (actual 3))
 (unexpected 4)]

deep-diff2 behaviour

For comparison, let's use deep-diff2 to show what I mean by "recovering".

❯ clj -Sdeps '{:deps {lambdaisland/deep-diff2 {:mvn/version "RELEASE"}}}' 
Clojure 1.11.1
user=> (require '[lambdaisland.deep-diff2 :as ddiff])
nil
user=> (ddiff/pretty-print (ddiff/diff [1 2 3 4] [1 3 4]))
[1 -2 3 4]
nil

Snapshot to show colours: image We see that deep-diff2 sees 2 as missing and then continues to match 3 and 4.

It also handles extra elements like this:

user=> (ddiff/pretty-print (ddiff/diff [1 3 4] [1 2 3 4]))
[1 +2 3 4]

We see 2 as an unexpected addition but shows 3 and 4 as matching.

To Consider

Would it be viable and/or make sense for matcher-combinators to compare sequences the same way that deep-diff2 does?

This would mean the missing 2 would might be reported like so:

expected: (match? [1 2 3 4] [1 3 4])
  actual: [1
 (missing 2)
 3
 4]

And the extra 2 maybe like so:

expected: (match? [1 3 4] [1 2 3 4])
  actual: [1
 (unexpected 2)
 3
 4]

Next Steps

If the above seems like a good idea, would be happy to help in any way that makes sense.

philomates commented 1 year ago

I didn't know deep-diff2 did that; it seems like a very cool and useful feature!

I think this is definitely possible, especially given we have a notion of mismatch weights (for example the weight of sequence mismatch is the sum of the mismatch weights of each mismatching element). Currently we mostly use this in the in-any-order matcher to try to find the minimal mismatch. In in-any-order this leads to some pretty poor performance but maybe there is a way to do this missing/extraneous element detection for the more strict default sequence matching logic that doesn't incur much additional overhead. I'll try to look into it in the next days.

cheers and glad to hear you've been enjoying using the lib :)

lread commented 1 year ago

Thanks for the reply @philomates!

If matcher-combinators were to adopt this new behaviour for sequences, I don't see a need for it to support the current behaviour. Does that make sense to you too?

philomates commented 1 year ago

@lread I think https://github.com/nubank/matcher-combinators/pull/187 might cover the ideas you've listed here if you want to check it out

lread commented 1 year ago

Very exciting @philomates! I'll try it out shortly and provide feedback in the PR!