Feature Discussion: Propagating more information forward than :ok and :fail allow

jkni commented 9 years ago

Hey!

Using and loving Jepsen and Knossos.

Lately, I've ran into a few situations where :ok and :fail do not allow the propagation of enough information forward into the checking phase.

As an example that will apply to most situations, a :fail-ed :cas actually provides more information than :fail makes available to the checker - it constrains the register to a value not equal to the compared value at some time between the :invoke and :fail times.

As a more prominent example, a :fail-ed :cas in Cassandra LWT gives us the value at read-time that did not match v in our :cas :value [v v']. It is possible for a :fail-ed :cas to be able to provide as much information to the checker as an :ok-ed :read in this circumstance.

I have two main ideas on strategies to allow this.

When completing the history in knossos, merge all new keys in the :ok-ed op into the :invoke op. Then, an enhanced-cas-register model could check keys on the :invoke and step the model appropriately. The main disadvantage of this is cognitive - it overloads the semantics of :ok.
When completing the history in knossos, allow functions to be provided to handle differ op :types. In this case, we might just rewrite the :fail :cas to an :ok :read. Again, the main disadvantage here is the cognitive overhead of rewriting history.

Thoughts on any approach you might be interested in integrating into Knossos/Jepsen? I'd PR.

aphyr commented 9 years ago

I think instead of either 1 or 2 we need to add an entirely new completion type; say, :illegal, which means "Instead of this operation not occurring, I want to assert that this transition was in fact illegal." You'll have to thread this through a fair amount of knossos, but I think it's a sensible approach, and, as you note, lets us detect bugs that knossos would otherwise allow.

aphyr commented 9 years ago

(specifically, we can't map :fail to :ok because knossos assumes that failed operations cannot have affected the state, and ok operations must affect the state)

jkni commented 9 years ago

(agreed - I spent a lot of time double-checking this on the Knossos internals)

I think :illegal nicely handles the general case of a :fail-ed :cas. I'm not sure how this handles the case of a :cas like Cassandra's, where we also should get a linearizable :read out of a :fail-ed :cas. Do you think this is enough of a special case that I should just hack around it in my tests?

aphyr commented 7 years ago

So thinking about this more, I think it miiiight make sense to perform a post-hoc transformation on the history which remaps the semantics of those illegal operations into ok/fail/indeterminate, like you've done in the cassandra tests--because, like you've said, you can handle things like "a failed CAS can be a read". Similarly, you can have negative senses for functions on your model too, like "cant-cas" which compares and sets UNLESS the values match; I think those are semantically equivalent and keep the knossos code small.

jepsen-io / knossos

Feature Discussion: Propagating more information forward than :ok and :fail allow #7