jepsen-io / jepsen

A framework for distributed systems verification, with fault injection
6.68k stars 711 forks source link

timeline/html checker might be broken since introducing history object #564

Open qvad opened 1 year ago

qvad commented 1 year ago

Recently I've updated Jepsen to 0.3.0 and started to observe this exception in tests that are using timeline/html checker.

     :checker   (independent/checker
                  (checker/compose
                    {:timeline (timeline/html)
                     :linear   (checker/linearizable
                                 {:model (multi-register {})})}))}))
2022-12-09 20:40:22,259{GMT}    WARN    [clojure-agent-send-off-pool-31] jepsen.checker: Error while checking history:
java.lang.ClassCastException: class clojure.lang.PersistentVector cannot be cast to class jepsen.history.Taskable (clojure.lang.PersistentVector and jepsen.history.Taskable are in unnamed module of loader 'app')
    at jepsen.checker.timeline$hiccup.invokeStatic(timeline.clj:182)
    at jepsen.checker.timeline$hiccup.invoke(timeline.clj:182)
    at jepsen.checker.timeline$html$reify__5307.check(timeline.clj:213)
    at jepsen.checker$check_safe.invokeStatic(checker.clj:86)
    at jepsen.checker$check_safe.invoke(checker.clj:79)
    at jepsen.checker$compose$reify__11881$fn__11883.invoke(checker.clj:102)
    at clojure.core$pmap$fn__8552$fn__8553.invoke(core.clj:7089)
    at clojure.core$binding_conveyor_fn$fn__5823.invoke(core.clj:2047)
    at clojure.lang.AFn.call(AFn.java:18)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

Workaround is to use old timeline/html code.

aphyr commented 1 year ago

The release notes for 0.3.0 talk about this--it's likely the code in question is passing in a vector rather than a jepsen.history.History.

qvad commented 1 year ago

@aphyr why does checker/linearizable works then, shouldn't it use same input objects? linearizable model code hasn't changed during upgrade in my code.

aphyr commented 1 year ago

Without knowing the caller it's tough for me to say!On Dec 16, 2022 16:40, Dmitry Sherstobitov @.***> wrote: @aphyr why does checker/linearizable works then, shouldn't it use same input objects? linearizable model code hasn't changed during upgrade.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

qvad commented 1 year ago

@aphyr, sorry, I'm still noob in Clojure and Jepsen :)

What do you mean by caller? As far as I understood there is no need to change client implementation, generator also should be ok. Also all other test works fine with timeline/html so problem is somewhere in this test, but I don't understand where to search.

:sz.multi-key-acid  (with-client multi-key-acid/workload (yugabyte.ysql.multi-key-acid/->YSQLMultiKeyAcidClient))

(defn workload
  [opts]
  (let [threads (:concurrency opts)]
    {:generator (ygen/with-op-index
                  (independent/concurrent-generator
                    (/ threads 2)
                    (range)
                    (fn [k]
                      (->> (gen/reserve (/ threads 4) r w)
                           (gen/stagger 1)
                           (gen/process-limit threads)))))
     :checker   (independent/checker
                  (checker/compose
                    {:timeline (timeline/html)
                     :linear   (checker/linearizable
                                 {:model (multi-register {})})}))}))
aphyr commented 1 year ago

I mean the code that's calling timeline/html's checker--it's not in the stacktrace you posted because we run those evaluations concurrently. This helps! I think it's a bug in independent/checker. I'm running out the door for a long drive here so I don't have time to write a full test, but I've made a quick patch in the main branch of Jepsen that might help. Give that a shot?

qvad commented 1 year ago

Got it, thank you! I've tried new independent code and it works.