Open enigma opened 1 year ago
That's probably best understood as a checker bug--I'll bet you that part of the checker is trying to find e.g. the time between the first and last successful ops, and if there aren't any, it blows up, or something like that. Feel free to dig into the Kafka checker in Jepsen if you'd like to fix this one--I imagine it should be a simple check to make sure at least a handful of required ops actually exist.On Sep 13, 2023 05:37, Andrea Ambu @.***> wrote: If maelstrom gets a binary that ignores all the kafka commands, for example like this modified demo/clojure/echo.clj, then maelstrom fails with a NullPointerException rather than some useful errors. For other workflows, eg broadcast/g-counter, maelstroms fails with some timeout exceptions that are more understandable. Is this the correct way to fail for the kafka workflow? $ lein run -- test -w kafka --bin demo/clojure/lenient_echo.clj # see gist [...] INFO [2023-09-13 10:24:40,229] jepsen node n1 - maelstrom.net Shutting down Maelstrom network INFO [2023-09-13 10:24:40,233] jepsen test runner - jepsen.core Analyzing... INFO [2023-09-13 10:24:40,294] clojure-agent-send-off-pool-5 - jepsen.tests.kafka Wrote /Users/ndr/coding/maelstrom-src/maelstrom/store/kafka/20230913T102325.079+0100/consume-counts.edn WARN [2023-09-13 10:24:40,295] clojure-agent-send-off-pool-5 - jepsen.checker Error while checking history: java.lang.NullPointerException: Cannot invoke "java.lang.Number.doubleValue()" because "x" is null at clojure.lang.Numbers.divide(Numbers.java:3899) at jepsen.util$nanosGT_secs.invokeStatic(util.clj:386) at jepsen.util$nanos__GT_secs.invoke(util.clj:386) at clojure.core$update.invokeStatic(core.clj:6231) at clojure.core$update.invoke(core.clj:6223) at jepsen.tests.kafka$checker$reify19272.check(kafka.clj:2080) at jepsen.checker$check_safe.invokeStatic(checker.clj:86) at jepsen.checker$check_safe.invoke(checker.clj:79) at jepsen.checker$compose$reify10709$fn10711.invoke(checker.clj:102) at clojure.core$pmap$fn8552$fn8553.invoke(core.clj:7089) at clojure.core$binding_conveyor_fn$fn5823.invoke(core.clj:2047) at clojure.lang.AFn.call(AFn.java:18) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1623) INFO [2023-09-13 10:24:40,428] jepsen test runner - jepsen.core Analysis complete INFO [2023-09-13 10:24:40,434] jepsen results - jepsen.store Wrote /Users/ndr/coding/maelstrom-src/maelstrom/store/kafka/20230913T102325.079+0100/results.edn INFO [2023-09-13 10:24:40,456] jepsen test runner - jepsen.core {:perf {:latency-graph {:valid? true}, :rate-graph {:valid? true}, :valid? true}, :timeline {:valid? true}, :exceptions {:valid? true}, :stats {:valid? false, :count 67, :ok-count 0, :fail-count 11, :info-count 56, :by-f {:assign {:valid? false, :count 11, :ok-count 0, :fail-count 11, :info-count 0}, :crash {:valid? false, :count 7, :ok-count 0, :fail-count 0, :info-count 7}, :poll {:valid? false, :count 20, :ok-count 0, :fail-count 0, :info-count 20}, :send {:valid? false, :count 29, :ok-count 0, :fail-count 0, :info-count 29}}}, :availability {:valid? true, :ok-fraction 0.0}, :net {:all {:send-count 70, :recv-count 70, :msg-count 70, :msgs-per-op 1.0447761}, :clients {:send-count 70, :recv-count 70, :msg-count 70}, :servers {:send-count 0, :recv-count 0, :msg-count 0, :msgs-per-op 0.0}, :valid? true}, :workload {:valid? :unknown, :error "java.lang.NullPointerException: Cannot invoke \"java.lang.Number.doubleValue()\" because \"x\" is null\n at clojure.lang.Numbers.divide (Numbers.java:3899)\n jepsen.util$nanosGT_secs.invokeStatic (util.clj:386)\n jepsen.util$nanos__GT_secs.invoke (util.clj:386)\n clojure.core$update.invokeStatic (core.clj:6231)\n clojure.core$update.invoke (core.clj:6223)\n jepsen.tests.kafka$checker$reify19272.check (kafka.clj:2080)\n jepsen.checker$check_safe.invokeStatic (checker.clj:86)\n jepsen.checker$check_safe.invoke (checker.clj:79)\n jepsen.checker$compose$reify10709$fn10711.invoke (checker.clj:102)\n clojure.core$pmap$fn8552$fn8553.invoke (core.clj:7089)\n clojure.core$binding_conveyor_fn$fn5823.invoke (core.clj:2047)\n clojure.lang.AFn.call (AFn.java:18)\n java.util.concurrent.FutureTask.run (FutureTask.java:317)\n java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1144)\n java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:642)\n java.lang.Thread.run (Thread.java:1623)\n"}, :valid? false}
Analysis invalid! (ノಥ益ಥ)ノ ┻━┻
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks, I've shared a tentative fix at https://github.com/jepsen-io/jepsen/pull/580 - comments welcome!
If maelstrom gets a binary that ignores all the kafka commands, for example like this modified demo/clojure/echo.clj, then maelstrom fails with a NullPointerException rather than some useful errors.
For other workflows, eg broadcast/g-counter, maelstroms fails with some timeout exceptions that are more understandable.
Is this the correct way to fail for the kafka workflow?