jepsen-io / jepsen

A framework for distributed systems verification, with fault injection
6.81k stars 719 forks source link

Using quorum flag for Etcd still fails for linearizable test #93

Closed siddontang closed 7 years ago

siddontang commented 8 years ago

Hi @aphyr

I know that etcd has a quorum flag for reading linearizability, so I try to test it with following changes:

:read  (let [value (-> client
                                 (v/get k {:consistent? true :quorum? true})
                                 (json/parse-string true))]
                   (assoc op :type :ok :value value))

I also upgrade to use jepsen 0.0.7 and clojure 1.7.0 in the project.clj.

To my surprise, the test fails. I see the etcd source and find that it it uses the quorum flag, it will commit a read-only entry in the log and do the query after majority followers replicating it, so I think this mechanism is linearizable. Am I wrong, or my test is wrong?

Below is the failed test output

{:html {:valid? true},
 :linear
 {:valid? false,
  :configs
  ({:model {:value 1},
    :pending
    [{:type :invoke,
      :f :cas,
      :value [4 2],
      :process 3,
      :time 6831014019,
      :index 58}
     {:type :invoke,
      :f :write,
      :value 3,
      :process 6,
      :time 8839541714,
      :index 74}
     {:type :invoke,
      :f :read,
      :process 8,
      :time 8838654383,
      :value 2,
      :index 75}]}
   {:model {:value 1},
    :pending
    [{:type :invoke,
      :f :write,
      :value 0,
      :process 1,
      :time 6830095656,
      :index 57}
     {:type :invoke,
      :f :cas,
      :value [4 2],
      :process 3,
      :time 6831014019,
      :index 58}
     {:type :invoke,
      :f :write,
      :value 3,
      :process 6,
      :time 8839541714,
      :index 74}
     {:type :invoke,
      :f :read,
      :process 8,
      :time 8838654383,
      :value 2,
      :index 75}]}),
  :final-paths
  ([{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :invoke,
      :f :write,
      :value 0,
      :process 1,
      :time 6830095656,
      :index 57},
     :model {:value 0}}
    {:op
     {:type :invoke,
      :f :write,
      :value 3,
      :process 6,
      :time 8839541714,
      :index 74},
     :model {:value 3}}
    {:op
     {:type :invoke,
      :f :cas,
      :value [4 2],
      :process 3,
      :time 6831014019,
      :index 58},
     :model {:msg "can't CAS 3 from 4 to 2"}}]
   [{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :invoke,
      :f :cas,
      :value [4 2],
      :process 3,
      :time 6831014019,
      :index 58},
     :model {:msg "can't CAS 1 from 4 to 2"}}]
   [{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :invoke,
      :f :write,
      :value 3,
      :process 6,
      :time 8839541714,
      :index 74},
     :model {:value 3}}
    {:op
     {:type :invoke,
      :f :write,
      :value 0,
      :process 1,
      :time 6830095656,
      :index 57},
     :model {:value 0}}
    {:op
     {:type :ok,
      :f :read,
      :process 8,
      :time 8852473707,
      :value 2,
      :index 76},
     :model {:msg "can't read 2 from register 0"}}]
   [{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :invoke,
      :f :write,
      :value 0,
      :process 1,
      :time 6830095656,
      :index 57},
     :model {:value 0}}
    {:op
     {:type :invoke,
      :f :write,
      :value 3,
      :process 6,
      :time 8839541714,
      :index 74},
     :model {:value 3}}
    {:op
     {:type :ok,
      :f :read,
      :process 8,
      :time 8852473707,
      :value 2,
      :index 76},
     :model {:msg "can't read 2 from register 3"}}]
   [{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :invoke,
      :f :write,
      :value 3,
      :process 6,
      :time 8839541714,
      :index 74},
     :model {:value 3}}
    {:op
     {:type :invoke,
      :f :cas,
      :value [4 2],
      :process 3,
      :time 6831014019,
      :index 58},
     :model {:msg "can't CAS 3 from 4 to 2"}}]
   [{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :invoke,
      :f :write,
      :value 3,
      :process 6,
      :time 8839541714,
      :index 74},
     :model {:value 3}}
    {:op
     {:type :invoke,
      :f :write,
      :value 0,
      :process 1,
      :time 6830095656,
      :index 57},
     :model {:value 0}}
    {:op
     {:type :invoke,
      :f :cas,
      :value [4 2],
      :process 3,
      :time 6831014019,
      :index 58},
     :model {:msg "can't CAS 0 from 4 to 2"}}]
   [{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :invoke,
      :f :write,
      :value 3,
      :process 6,
      :time 8839541714,
      :index 74},
     :model {:value 3}}
    {:op
     {:type :ok,
      :f :read,
      :process 8,
      :time 8852473707,
      :value 2,
      :index 76},
     :model {:msg "can't read 2 from register 3"}}]
   [{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :invoke,
      :f :write,
      :value 0,
      :process 1,
      :time 6830095656,
      :index 57},
     :model {:value 0}}
    {:op
     {:type :ok,
      :f :read,
      :process 8,
      :time 8852473707,
      :value 2,
      :index 76},
     :model {:msg "can't read 2 from register 0"}}]
   [{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :ok,
      :f :read,
      :process 8,
      :time 8852473707,
      :value 2,
      :index 76},
     :model {:msg "can't read 2 from register 1"}}]
   [{:op
     {:type :ok,
      :f :read,
      :process 0,
      :time 8833496421,
      :value 1,
      :index 73},
     :model {:value 1}}
    {:op
     {:type :invoke,
      :f :write,
      :value 0,
      :process 1,
      :time 6830095656,
      :index 57},
     :model {:value 0}}
    {:op
     {:type :invoke,
      :f :cas,
      :value [4 2],
      :process 3,
      :time 6831014019,
      :index 58},
     :model {:msg "can't CAS 0 from 4 to 2"}}]),
  :previous-ok
  {:type :ok,
   :f :read,
   :process 0,
   :time 8833496421,
   :value 1,
   :index 73},
  :op
  {:type :ok,
   :f :read,
   :process 8,
   :time 8852473707,
   :value 2,
   :index 76}},
 :valid? false}

lein test :only jepsen.etcd-test/register-test

FAIL in (register-test) (etcd_test.clj:46)
expected: (:valid? (:results test))
  actual: false
Not linearizable. Linearizable prefix was:

Followed by inconsistent operation:
        nil nil

-------------------------------------------------------------
Just prior to that operation, possible interpretations of the
linearizable prefix were:

lein test :only jepsen.etcd-test/register-test

ERROR in (register-test) (ArrayList.java:177)
Uncaught exception, not in assertion.
expected: nil
  actual: java.lang.NullPointerException: null
 at java.util.ArrayList.<init> (ArrayList.java:177)
    clojure.core$shuffle.invoke (core.clj:6981)
    jepsen.report$linearizability.invoke (report.clj:36)
    jepsen.etcd_test/fn (etcd_test.clj:47)
    clojure.test$test_var$fn__7670.invoke (test.clj:704)
    clojure.test$test_var.invoke (test.clj:704)
    clojure.test$test_vars$fn__7692$fn__7697.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars$fn__7692.invoke (test.clj:722)
    clojure.test$default_fixture.invoke (test.clj:674)
    clojure.test$test_vars.invoke (test.clj:718)
    clojure.test$test_all_vars.invoke (test.clj:728)
    clojure.test$test_ns.invoke (test.clj:747)
    clojure.core$map$fn__4553.invoke (core.clj:2624)
    clojure.lang.LazySeq.sval (LazySeq.java:40)
    clojure.lang.LazySeq.seq (LazySeq.java:49)
    clojure.lang.Cons.next (Cons.java:39)
    clojure.lang.RT.boundedLength (RT.java:1735)
    clojure.lang.RestFn.applyTo (RestFn.java:130)
    clojure.core$apply.invoke (core.clj:632)
    clojure.test$run_tests.doInvoke (test.clj:762)
    clojure.lang.RestFn.applyTo (RestFn.java:137)
    clojure.core$apply.invoke (core.clj:630)
    user$eval85$fn__144$fn__175.invoke (form-init5183894627931871576.clj:1)
    user$eval85$fn__144$fn__145.invoke (form-init5183894627931871576.clj:1)
    user$eval85$fn__144.invoke (form-init5183894627931871576.clj:1)
    user$eval85.invoke (form-init5183894627931871576.clj:1)
    clojure.lang.Compiler.eval (Compiler.java:6782)
    clojure.lang.Compiler.eval (Compiler.java:6772)
    clojure.lang.Compiler.load (Compiler.java:7227)
    clojure.lang.Compiler.loadFile (Compiler.java:7165)
    clojure.main$load_script.invoke (main.clj:275)
    clojure.main$init_opt.invoke (main.clj:280)
    clojure.main$initialize.invoke (main.clj:308)
    clojure.main$null_opt.invoke (main.clj:343)
    clojure.main$main.doInvoke (main.clj:421)
    clojure.lang.RestFn.invoke (RestFn.java:421)
    clojure.lang.Var.invoke (Var.java:383)
    clojure.lang.AFn.applyToHelper (AFn.java:156)
    clojure.lang.Var.applyTo (Var.java:700)
    clojure.main.main (main.java:37)

Ran 1 tests containing 2 assertions.
1 failures, 1 errors.
Tests failed.

Thanks you.

aphyr commented 8 years ago

On 01/06/2016 05:02 PM, siddontang wrote:

To my surprise, the test fails. I see the etcd source and find that it it uses the quorum flag, it will commit a read-only entry in the log and do the query after majority followers replicating it, so I think this mechanism is linearizable. Am I wrong, or my test is wrong?

Not sure, don't have time to look right now but this is certainly worth looking in to! I'm really sorry, just overwhelmed by email and work right now.

--Kyle

siddontang commented 8 years ago

Hope your looking in to later :-)

I will use Jepsen to test our distributed system later too. Thank you for this awesome project.

AkihiroSuda commented 8 years ago

https://github.com/aphyr/jepsen/commit/ffd745f34aa8f076ebb69f783b065ce31d04fb9f closed this

dzrw commented 7 years ago

Is this still open?