jepsen-io / knossos

Verifies the linearizability of experimentally accessible histories.
398 stars 31 forks source link

Process already running and :info messages. #31

Closed insumity closed 5 years ago

insumity commented 5 years ago

Assume you have this history.edn file:

[ {:process 1, :type :invoke, :f :cas, :value [2 0]}
  {:process 1, :type :info, :f :cas, :value [2 0]}
  {:process 1, :type :invoke, :f :read, :value nil}]

Running the above history against Knossos gives me the following result:

java.lang.RuntimeException: Process 1 already running #knossos.op.Op{:process 1, :type :invoke, :f :cas, :value [2 0], :index 0}, yet attempted to invoke #knossos.op.Op{:process 1, :type :invoke, :f :read, :value nil, :index 2} concurrently

It seems to me that after an info response, a process cannot do anything else. Looking at other histories in the knossos repository, it seems indeed the case that if a process gets an info message, it will never invoke any other operation after it. Is this indeed the case? Maybe, the creator of issue had a similar problem.

Reading Knossos concepts here:

Every logical transition in Knossos is represented by a pair of operations: an invoke, when the op begins, and a completion: either ok if the operation took place, fail if the operation did not take place, or info if you're not sure what happened, e.g. the operation timed out. an info response, seems like what is adequate here. We do not know whether the cas succeeded or not.

it seems to me, this does not have to be the case. For example, I could think of a client c getting an operation timed-out and hence logging an info message, however c might still want to invoke operations later on in an execution. The easy fix would be, that after every timed-out operation, client c starts logging operations (i.e., invocations and responses) with a different client identifier c'. Is there a better way to solve this issue in Knossos?

aphyr commented 5 years ago

however c might still want to invoke operations later on in an execution.

This would violate the assumption that processes are single-threaded; once a process has crashed, it can't do anything else, because that might mean the process does two things at once. You are, of course, free to spin up additional processes to take the place of crashed ones, as you suggest. That's exactly what Jepsen does. :)