jepsen-io / jepsen

A framework for distributed systems verification, with fault injection
6.69k stars 710 forks source link

Elle checks :fail results? #489

Closed yito88 closed 3 years ago

yito88 commented 3 years ago

I'm working on a test with Elle to verify serializable transactions. I have a question about checking anomalies.

Does Elle check the anomalies with :fail results? I tried Elle via Jepsen (jepsen.tests.cycle.append/test), but I had anomalies for :fail results. I'd like to ignore these :fail results for checking anomalies.

{:clock {:valid? true},
 :stats
 {:valid? true,
  :count 1504,
  :ok-count 318,
  :fail-count 1184,
  :info-count 2,
  :by-f
  {:txn
   {:valid? true,
    :count 1504,
    :ok-count 318,
    :fail-count 1184,
    :info-count 2}}},
 :exceptions {:valid? true},
 :workload
 {:valid? false,
  :anomaly-types (:G1a :G1b :dirty-update),
  :anomalies
  {:G1a
   ({:op
     {:type :ok,
      :f :txn,
      :value [[:r 18 [1]] [:r 17 [3]] [:append 6 10] [:append 17 9]],
      :time 24868956077,
      :process 0,
      :index 101},
     :mop [:r 18 [1]],
     :writer
     {:type :fail,
      :f :txn,
      :value
      [[:append 14 6]
       [:append 18 1]
       [:r 17 nil]
       [:r 17 nil]
       [:r 17 nil]
       [:append 14 7]
       [:append 18 2]],
      :time 24441252230,
      :process 1,
      :error [:crud-error "rollback is toward non-prepared record"],
      :index 91},
     :element 1}
...
aphyr commented 3 years ago

Huh, I'm not sure I understand your question! Elle does look at :ok, :info, and :fail operations. It has to do this in order to draw meaningful inferences about potential anomalies. If you asked it not to consider some of these operations, it would appear to Elle as if values suddenly appeared from thin air, and I suspect that's going to make Elle very confused.

It looks like your history contains aborted reads (G1a), and if that's the case, all bets are off: you can't really trust anything the database tells you. You could tell Elle you don't care about catching aborted reads, I suppose, by providing :consistency-models [] to the checker, but then it won't check, well, anything. You could ask for it to look for... maybe just :G0 anomalies? Is that the only thing you care about? You've said you're trying to test for serializability, which makes me think that you definitely don't want to ignore these anomalies: they're serious serializability violations!

yito88 commented 3 years ago

Thank you so much for your comment, and sorry for the late reply. I misunderstood the result and found a bug in my test. After the bug was fixed, it worked well as expected.