jepsen-io / knossos

Verifies the linearizability of experimentally accessible histories.
395 stars 31 forks source link

Unhelpful error message when trying to render failed portion of a history #36

Open mpalmer opened 4 years ago

mpalmer commented 4 years ago

Hi,

I'm using Knossos to verify linearizability on a history generated by a distributed system that is, for now, modelling a register. The checker is reporting that the history is not linearizable, but when I ask for a rendering of the problem sections, I get the error

Execution error (AssertionError) at knossos.linear.report/time-coords$fn (report.clj:184).
Assert failed: Expected index but got nil for op #knossos.op.Op{:process nil, :type nil, :f nil, :value nil, :index nil}

Given that I'm writing out a history "by hand", I'm assuming that I've done something wrong, but the current error being reported isn't helping me determine what exactly I'm doing wrong.

This is the history file I'm using, and this is what I'm doing to try and render it (which has worked with other broken histories in the past, so I'm pretty sure it's capable of working):

knossos.cli=> (require '[knossos.model :as model] '[knossos.competition :as competition])
nil
knossos.cli=> (require '[knossos.linear.report :as report])
nil
knossos.cli=> (def h (read-history "edn.txt"))
#'knossos.cli/h
knossos.cli=> (def a (competition/analysis (model/register) h))
Jun 25, 2020 5:51:51 PM clojure.tools.logging$eval1047$fn__1050 invoke
INFO: More than 1024 reachable models; not memoizing models for this search
#'knossos.cli/a
knossos.cli=> Jun 25, 2020 5:51:51 PM clojure.tools.logging$eval1047$fn__1050 invoke
INFO: More than 1024 reachable models; not memoizing models for this search
knossos.cli=> (report/render-analysis! h a "linear.svg")

Execution error (AssertionError) at knossos.linear.report/time-coords$fn (report.clj:185).
Assert failed: Expected index but got nil for op #knossos.op.Op{:process nil, :type nil, :f nil, :value nil, :index nil}

Any assistance appreciated.

aphyr commented 4 years ago

Huh! Yeah, presumably the history is malformed in some way, and you're right, that error message isn't helpful.

aphyr commented 4 years ago

Oh, I think I know what's up. You're asking for a register but not providing an initial value, so it defaults to nil. Then the first read is 0. I bet the renderer doesn't know how to handle the very first operation in the history going wrong--if I were to hazard a guess, it called map->Op on the nil :op field that starts off the final-path for the analysis:

{:valid? false,
 :configs
 ({:model #knossos.model.CASRegister{:value nil},
   :last-op nil,
   :pending
   [{:process 4, :type :invoke, :f :read, :value 0, :index 0}
    {:process 5, :type :invoke, :f :read, :value 0, :index 1}]}),
 :final-paths
 #{[{:op nil, :model #knossos.model.CASRegister{:value nil}}  ; Right here, see the nil op? That's cuz there WAS no op that generated the initial state.
    {:op {:process 4, :type :ok, :f :read, :value 0, :index 2},
     :model
     #knossos.model.Inconsistent{:msg "can't read 0 from register "}}]
   [{:op nil, :model #knossos.model.CASRegister{:value nil}}
    {:op {:process 5, :type :invoke, :f :read, :value 0, :index 1},
     :model
     #knossos.model.Inconsistent{:msg
                                 "can't read 0 from register "}}]},

I honestly don't know what to do off the top of my head, and I gotta take care of some other projects right now. The analysis is doing the right thing here--you're correct that it's the SVG renderer that's messing up. There probably needs to be an explicit check for when the first op is nil, and... when that happens, we could... generate a synthetic initial op for rendering purposes? Or refuse to render anything?

mpalmer commented 4 years ago

Aha! I can fairly easily make sure that the first operation in my stress tester is a write, if that'll make for a happy knossos. Thanks for the pointer (and for the software).

Addendum: I added a "fake" :f :write, :value 0 :invoke/:ok op pair at the top of my history, and it did indeed solve the problem -- or at least, a problem. The history is now detected as linearizable, so the "value came back wrong" was a false positive caused by a bad output conversion. I now know of several bugs to fix in my stress tester. Whee!

aphyr commented 4 years ago

You don't even need to do that! Take a look at the docs for (register)... It should take an initial value as an argument. :)

mpalmer commented 4 years ago

That was just my temporary "hack" to see if the history was otherwise linearizable. The fix in my stress tester was to stop (erroneously) turning nil into 0 when writing out the history file.