jepsen-io / knossos

Verifies the linearizability of experimentally accessible histories.
398 stars 31 forks source link

More documentation about getting started? #14

Closed mstewartgallus closed 7 years ago

mstewartgallus commented 7 years ago

Hello,

I am currently working on a concurrent spin-lock algorithm for many-core machines similar to the MCS and CLH queue lock algorithms but that can replace the queue with a stack (for more performance) or other datastructure. See https://sstewartgallus.com/git?p=qlock.git;a=tree for the code. I modelled the algorithm in TLA+ and checked it for a small amount of threads and found and fixed some deadlocks. However, TLA+ does not model weak memory orderings and I'm concerned my algorithm may not use acquire and release atomics appropriately.

To fix this problem, I created a mode where I write a history of operations to a log (although I need to put more work into making sure that writing to the log doesn't cause synchronization that would prevent memory reorderings) in a format like:

{:process 36, :type :invoke, :f :try_acquire, :value 0x7f8f9200e100}
{:process 36, :type :ok, :f :try_acquire, :value true}
{:process 40, :type :invoke, :f :pop, :value 0x7f8f9200e080}
{:process 40, :type :ok, :f :pop, :value 0x55bea84a3400}
{:process 41, :type :invoke, :f :release, :value 0x7f8f9200e100}
{:process 41, :type :ok, :f :release, :value (list)}
{:process 42, :type :invoke, :f :empty, :value 0x7f8f9200e080 }
{:process 42, :type :ok, :f :empty, :value true }
{:process 43, :type :invoke, :f :try_acquire, :value 0x7f8f9200e100}
{:process 43, :type :ok, :f :try_acquire, :value true}
{:process 44, :type :invoke, :f :pop, :value 0x7f8f9200e080}

As I understand it Knossos should let me check if such a log is linearizable and therefore experimentally determine that I am using weak memory orderings correctly. However, I am finding it very difficult to get started.

Is there any extra documentation about getting started using Knossos to check a history for linearizability? As far as I am aware most documentation focuses around Jepson which is heavily built around distributed and networked systems and doesn't just take a preexisting history.

aphyr commented 7 years ago

Yep! Once you've got that, slap a pair of brackets around it like so: [{...} {...} {...}] and you'll have a history. You can read that into memory with something like

https://github.com/jepsen-io/knossos/blob/master/test/knossos/core_test.clj#L29-L33

You'll need to write a model like https://github.com/jepsen-io/knossos/blob/master/src/knossos/model.clj#L89-L106 that models the single-threaded semantics of your data type--it has to understand what try_acquire and pop mean.

Then, call (knossos.linear/analysis model history). The knossos.wgl checker is faster but can't produce trace information to tell you what went wrong yet. You can also generate renderings of any faults found with knossos.linear.report.

mstewartgallus commented 7 years ago

Thank you very much.

I got started but I'm confused. The checker doesn't seem to check the :ok part of the history.

(ns lock-check.core
  (:gen-class)
  (:require  [knossos.core :refer :all]
         [knossos.model :as model]
         [knossos.linear]
         [clojure.java.io :as io]
         [clojure.edn :as edn])
  (:import (java.io PushbackReader)))

(defn read-history-2
  "Reads an EDN history file containing a single collection of operation maps."
  [f]
  (with-open [r (PushbackReader. (io/reader f))]
         (edn/read r)))

(defn get-mutex [rawMutexes ptr]
  ""
  (if (contains? rawMutexes ptr)
      (rawMutexes ptr)
      false))

(defn get-stack [stacks ptr]
  ""
  (if (contains? stacks ptr)
      (stacks ptr)
      (list)))

(defn get-node [nodes ptr]
  ""
  (if (contains? nodes ptr)
      (nodes ptr)
      false))

(defrecord Lock [rawMutexes stacks nodes]
  knossos.model.Model
  (step [r op]
    (let [value (:value op)]
      (condp = (:f op)
         :wait  (let [node (get-node nodes value)]
                 (Lock. rawMutexes stacks (assoc nodes value node)))
         :signal  (let [node (get-node nodes value)]
                   (if node
                   (model/inconsistent "already signaled")
                 (Lock. rawMutexes stacks (assoc nodes value true))))
         :push  (let [stack (get-stack stacks (first value))]
                 (Lock. rawMutexes (assoc stacks value (cons (second value) stack)) nodes))
         :pop  (let [stack (get-stack stacks value)]
                (if (empty? stack)
                (Lock. rawMutexes (assoc stacks value (list)) nodes)
                (Lock. rawMutexes (assoc stacks value (pop stack)) nodes)))
         :empty (let [stack (get-stack stacks value)]
                 (Lock. rawMutexes (assoc stacks value stack) nodes))
         :try_acquire (let [rawMutex (get-mutex rawMutexes value)]
                   (Lock. (assoc rawMutexes value true) stacks nodes))

         :release (let [rawMutex (get-mutex rawMutexes value)]
                   (if rawMutex
                   (model/inconsistent "not held")
                 (Lock. (assoc rawMutexes value false) stacks nodes))))))
  Object
  (toString [this] (.toString rawMutexes)))

(defn lock
  "A lock responding to :try_acquire, :pop, :release and :empty messages"
  []
  (Lock. {} {} {}))

(defn -main
  "I don't do a whole lot ... yet."
  [& args]
  (let [history (read-history-2 "log.edn")]
       (knossos.linear/analysis (lock) history)))
aphyr commented 7 years ago

OK! Did you... print out the analysis? What did it say? (try clojure.pprint/pprint)

mstewartgallus commented 7 years ago

Okay I figured out the problem. The value parameter isn't used for :invoke events. I have to save the value and pass into the :ok events.

I think I have got far enough along to close this issue.

aphyr commented 7 years ago

Ah, yeah, value should be the same between invocations and completions. You can leave it off the invocation if you don't know what the value will be, and Knossos will fill it in from the completion.

aphyr commented 7 years ago

(By the way, thank you! When I get more time I'd like to make more of this explicit, and offer guidance to users when they provide malformed histories instead of silently doing the wrong thing haha. Definitely a research project, not user-friendly OSS yet. :))

mstewartgallus commented 7 years ago

I should also thank you. I don't even begin to know how to write a (fast) linearizability checker. Knossos has saved a lot of time.

aphyr commented 7 years ago

You probably already figured this out, but just in case--Knossos models state transitions as a pure function (step! model op), not as a relation between inputs and outputs on the datatype (= result (step! model arg)). So things like a read, which would look like object.read() => some-value in a regular program, get expressed as object.read(expected-value). If you want to model something with different input and output, just like multi-arg functions, you can encode it as a tuple or map in the operation :value, say {:f :drop-eggs, :value {:count 2, :return-val :smashed}}