deep diff by editscript

guruma commented 1 year ago

I add a feature of deep diffing by using editscript library. I think It would be very useful especially for finding diffs between two big maps.

(require '[eftest.runner :refer [e=]])
(deftest deep-map-test
  (is (e= {:a 1 :b [1 2 3] :c {:a 1}}
          {:a 2 :b [2 3 4] :c {:a 2}})))

e= stands for 'editscript equal'

> lein test
expected: (e= {:a 1, :b [1 2 3], :c {:a 1}} {:a 2, :b [2 3 4], :c {:a 2}})
  actual: (not (e= {:a 1, :b [1 2 3], :c {:a 1}} {:a 2, :b [2 3 4], :c {:a 2}}))
    diff: ({:path [:a], :exp 1, :act 2}
           {:path [:b 0], :exp 1, :act 2}
           {:path [:b 2], :exp 3, :act 4}
           {:path [:c :a], :exp 1, :act 2})

please review and let me know what you think!

guruma commented 1 year ago

Build failed because the original branch depends on clojure 1.7.0 in project.clj Changing clojure 1.7.0 to clojure 1.9.0, on which editscript library depends for nat-int?, will make the build succeded

weavejester commented 1 year ago

Rather than changing how the 'pretty' reporter works, instead an 'editscript' reporter could be created instead.

Can you also show the difference between the editscript diff, and how its reported currently? Is editscript the best Clojure diffing library to use? Are there alternatives, and what do they look like?

guruma commented 1 year ago

Rather than changing how the 'pretty' reporter works, instead an 'editscript' reporter could be created instead.

I have no idea how much good an 'editscript' reporter is, because the 'pretty' reporter has lots of private functions which the report functions have to use.

Maybe It might be that I don't understand what you mean. Could you explain a little more about that?

Can you also show the difference between the editscript diff, and how its reported currently? Is editscript the best Clojure diffing library to use? Are there alternatives, and what do they look like?

Comparing Clojure Diff Libraries will help you understand the difference and you can check the benchmark of different diff libraries in the blog. There are two diffing algorithms that is used in editscript library. Editscript A is slow, but Editscript quick is fast. I use Editscript quick for comparing expected and actaul, and Editscript A for making diffs only when expected is not equal to actaul

Edit script is a way to quantify diffs between two strings, aka Levenshtein distance, which is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. But this is ultimately reduced to the problem of the shortest path between two vertices in a graph which can be solved optimally by A* algorithm.

If you want more informations, you can read ['Wu, S. et al., 1990, An O(NP) Sequence Comparison Algorithm, Information Processing Letters] on which editscript library is based.

weavejester commented 1 year ago

e= stands for 'editscript equal'

Why use e=? Why not diff automatically on failure?

I have no idea how much good an 'editscript' reporter is, because the 'pretty' reporter has lots of private functions which the report functions have to use.

What private functions would you need?

guruma commented 1 year ago

Why use e=? Why not diff automatically on failure?

I think that the original = need to be leaved for someone who need it.

What private functions would you need?

The problem is not what private function I need but the private functions themselves. The private functions prevent others from adding new multimethod for their own dispatch. For example, the following codes will show this.

(ns other-project.namespace
  (:require [clojure.test :refer :all]))

; this is possible
(defmethod clojure.test/assert-expr 'e= [msg form]
  (let [args (rest form)
        pred (first form)]
    `(let [values# (list ~@args)
           result# (apply ~pred values#)]
       (if result#
         (clojure.test/do-report {
                  :type :pass
                  :message ~msg
                  :expected '~form
                  :actual (cons ~pred values#)})
         (let [diffs# (apply make-diffs values#)]
           (clojure.test/do-report {
                  :type :fail-with-diffs
                  :message ~msg
                  :expected '~form
                  :actual (list '~'not (cons '~pred values#))
                  :diffs diffs#})))
       result#)))

When using only clojure.test, it is possible to add new multimethod. Because all function in clojure.test is not private as mentioned there as below.

;; Nothing is marked "private" here, so you can rebind things to plug ;; in your own testing or reporting frameworks.

(ns other-project.namespace
  (:require [clojure.test :as test]
            [eftest.runner :as ef :refer [e=]]))

; this is not possible
(defmethod report :fail-with-diffs [{:keys [message expected] :as m}]
  (test/with-test-out
    (test/inc-report-counter :fail)
    (print *divider*)
    (println (str (:fail *fonts*) "FAIL" (:reset *fonts*) " in") (testing-scope-str m))
    (when (seq test/*testing-contexts*) (println (test/testing-contexts-str)))
    (when message (println message))
    (when (= (first expected) 'e=)
      (editscript-equals-fail-report m))
    (print-output (capture/read-test-buffer))))

But it isn't possible to add new multimethod in eftest because the functions like testing-scope-str, print-output, are private.

weavejester commented 1 year ago

I think that the original = need to be leaved for someone who need it.

I think this design complects what you're testing (equality) with how it's being displayed (diffs). The current design of Eftest is that the reporters determine how the test output is presented to the user, not the tests themselves.

So if it showed a diff for failing data structures over a certain size, that would work better I think. Or if that behaviour could be forced via metadata on the test itself.

The problem is not what private function I need but the private functions themselves. The private functions prevent others from adding new multimethod for their own dispatch.

Sure, but you can wrap report method itself, like the progress reporter does to the pretty reporter. As far as I can see, you add one extra reporting method, so what private functions do we absolutely need, and what can be delegated to the pretty reporter?

weavejester / eftest

deep diff by editscript #90