jepsen-io / knossos

Verifies the linearizability of experimentally accessible histories.
398 stars 31 forks source link

linearizability checker incorrectly validates non-linear history #10

Closed johanatan closed 7 years ago

johanatan commented 7 years ago

With 'register' model:

     1  0 :invoke :write  1
     2  0 :ok :write  1
     3  0 :invoke :write  3
     4  0 :ok :write  3
     5  0 :invoke :write  2
     6  0 :ok :write  2
     7  0 :invoke :read nil
     8  0 :ok :read 2
     9  0 :invoke :write  4
    10  0 :ok :write  4
    11  0 :invoke :read nil
    12  0 :ok :read 4
    13  0 :invoke :read nil
    14  0 :ok :read 4
    15  0 :invoke :write  4
    16  0 :info :write  4 :timeout
    17  1 :invoke :read nil
    18  1 :ok :read 4
    19  1 :invoke :write  0
    20  1 :info :write  0 :timeout
    21  2 :invoke :read nil
    22  2 :ok :read 4
    23  2 :invoke :write  0
    24  2 :info :write  0 :timeout
    25  3 :invoke :read nil
    26  3 :ok :read 0
    27  3 :invoke :read nil
    28  3 :ok :read 0
    29  3 :invoke :read nil
    30  3 :ok :read 0
    31  3 :invoke :read nil
    32  3 :ok :read 0
    33  3 :invoke :write  0
    34  3 :info :write  0 :timeout
    35  4 :invoke :read nil
    36  4 :ok :read 4
    37  4 :invoke :write  1
    38  4 :info :write  1 :timeout
    39  5 :invoke :read nil
    40  5 :ok :read 0

The reads on lines: 26, 28, 30, & 32 should have gotten 4 instead of 0. The read on line 40 should have gotten 4 instead of 0 although the interim read on line 36 correctly got 4.

history.txt

aphyr commented 7 years ago

Your history is linearizable, which is why Knossos claims it's linearizable.

At line 26, you have concurrent writes of 4 (from line 16), and two writes of 0 (lines 20 and 24). The value is 4 from lines 10 to 22; we use one crashed write of 0 to read 0 at line 26, and the crashed write of 4 to read 4 on line 36, then another crashed write of 0 (any of lines 20, 24, or 34) to read 0 on line 40.

johanatan commented 7 years ago

Sorry but how can you "use a crashed write" to read a value? If the write operation timed out, then it did not get committed to any datastore and thus would not be available for "use".

And, wouldn't you need to know if the distributed system under test would in fact offer up an uncommitted value for interim reads before you could say that it was linearizable or not? Certainly for a system that should not offer up an uncommitted write (i.e., a system tuned for consistency over availability), the above history should not be linearizable.

In an ideal world, we would tell knossos which type of system we are testing and it would adjust its strictness to suit.

aphyr commented 7 years ago

When a request times out because no acknowledgement was received by a client, how do you suppose that client knows whether the distributed system it's talking to has committed its operation?

johanatan commented 7 years ago

I don't. However, I think it would be most prudent for all involved parties (incl. the client & especially a post hoc analyzer) to assume the worst rather than the best. [Certainly there should be a switch on the analyzer to put it into one mode or the other].

aphyr commented 7 years ago

If the client cannot determine whether or not an operation has taken place, how is Knossos supposed to know not to use that operation?

johanatan commented 7 years ago

Knossos is currently making an assumption. I'm merely suggesting that it make the alternate assumption (configurable if you wish).

You can see from the log that the timeouts do occur: the question is how do we interpret that data. I would argue that, for a post hoc analyzer, assuming worst-case rather than best-case could be more helpful.

aphyr commented 7 years ago

On the contrary; Knossos does not assume a timeout indicates either success or failure, precisely because it could be either. What you're suggesting would lead Knossos to incorrectly declare legal histories illegal.

johanatan commented 7 years ago

Knossos does assume that a timed out write was committed. The fact of the matter is that it does not have enough information to know one way or the other. Only the person running it having knowledge of the system under test could say definitively. Therefore, better to raise a false flag than to say "nothing to see here".

And I'm not suggesting that it be hardcoded one way or the other: rather that it be configurable by the operator.

aphyr commented 7 years ago

Knossos does assume that a timed out write was committed.

I can see how this could be confusing. Knossos does not, in general, make any assumptions about timed out operations. It allows for both possibilities, because, as you've astutely observed, it is impossible for the client to determine after a timeout whether the system did or did not apply its requested operation. However, in this particular history Knossos does assume that some crashed operations did in fact succeed, because those assumptions are necessary to satisfy later constraints on the history.

For a quick overview of linearizability and how Jepsen+Knossos interpret known vs indeterminate oucomes, see https://www.youtube.com/watch?v=dE3KT7hHkKY. For more background on why you can't introduce a flag like you're suggesting, see the Two-Generals problem: https://en.wikipedia.org/wiki/Two_Generals%27_Problem

johanatan commented 7 years ago

'Two Generals' has nothing to do with the flag I am suggesting (and I'm quite familiar with it already). The flag informs post-hoc analysis. I've made a very clear case for why stricter analysis would be more useful. You do not seem to be willing to see it. That is your prerogative.

Further, your throwing up 'Two Generals' is very much akin to others throwing up 'Halting Problem' when any discussion of totality, computability &/or decidability is undertaken: it's intellectually [and conveniently] lazy.

aphyr commented 7 years ago

Oh, okay, this is good! So you know that two-generals implies that in an asynchronous network, it is impossible to reach consensus in the face of arbitrary communication failures. In this case, we'd like the client and the server to agree on whether or not a given operation took place, but because of a failure (e.g. a server crash, partition, or simply slowness), arbitrary communication failure is possible. Adding a flag like you're suggesting (where we make assumptions about the outcome of a timeout) would constitute a consensus algorithm which violates two-generals. That's why Knossos can't do what you're requesting.

I say this in the video too, but to handle this problem, we classify completions in three types: known successes (:ok), known failures (:fail), and indeterminate results (:info). Knossos reasons about those indeterminate results by holding the invocation open for all time, which is why it's legal to apply those operations later. To sum up, this is why your history is linearizable!

johanatan commented 7 years ago

To be clear, the flag I was suggesting does not introduce any additional assumption: it merely changes the assumption you're currently making in this particular case (and others like it) from an optimistic one into a pessimistic one.

I think it may help to keep in mind why people are using this tool to begin with: if the tool flags a few of these cases for manual inspection (even while perhaps still giving a final result of 'linearizable') then I think it would be a win. [As there are alternate explanations for how subsequent reads could have gotten the values they did].

Perhaps you could output from the analysis phase the indeterminate results that were held, and for how long, and how they were finally resolved?

aphyr commented 7 years ago

the flag I was suggesting does not introduce any additional assumption: it merely changes the assumption you're currently making in this particular case (and others like it) from an optimistic one into a pessimistic one.

For the fourth time: Knossos does not make an optimistic assumption. It specifically avoids making assumptions. Please stop claiming otherwise.

Maybe a concrete example would help? Imagine that your client asks a server to write 4. It sends a message to the server saying "please write 4." The server receives that message, writes 4, but responds slowly, causing the client to time out. You're suggesting we assume the write of 4 never occurred. This is clearly incorrect.

Conversely, if you assume the write of 4 did occur, it's also possible that the client sent "please write 4" to the server, but the server crashed (or some other failure in the system occurred) preventing that write from occurring. There is no flag you can add which will not, at some point, lie to you.

Knossos (and Jepsen) are intended to check these sorts of systems, where indeterminacy can and will occur. Like any post-hoc linearizability checker, Knossos can be at best sound but not complete. You're suggesting we give up soundness as well, and I I can't fathom why--why would you ever want to take correct results and... make them wrong some of the time? What do you have to gain from spurious nonlinearizability findings?

aphyr commented 7 years ago

I think it may help to keep in mind why people are using this tool to begin with

(If... you... somehow weren't aware of this, my full time job is distributed systems verification. That's... why I wrote this tool: for my job. I use it every day. I have a pretty good idea of what it's for.)

garyxia commented 7 years ago

" However, in this particular history Knossos does assume that some crashed operations did in fact succeed, because those assumptions are necessary to satisfy later constraints on the history."

It sounds like there is some assumption that a timed out op succeeded because that was forced upon by later data? Could knossos make the opposite assumption? I think that's what johanatan meant.

Blaisorblade commented 7 years ago

Found from Twitter & might have figured out the confusion. @aphyr is entirely right for his requirements, since Knossos tries to prove bugs, but @johanatan has other requirements (which might make sense or not) without realizing it—and what he last asked might be useful even if his requirements are unsatisfiable.

I don't buy at all the argument about "system tuned for consistency" — the above history is valid for such a system, and all the arguments by @aphyr apply.

A better argument might be "while producing the trace my system behaved like a (semi-)synchronous network, because it was a bunch of VMs with absurdly long timeouts, so the network was more reliable than the program and there was no partial failure".

In particular, @johanatan is assuming a complementary perspective where he'd like an unsound and complete tool, could do with an unsound and incomplete tool, and would even be happy with an explanation of Knossos's reasoning showing how the history was linearized. That could be useful to novices learning about linearization or confused by its results.

Emitting a linearization certificate (certificate as meant in NP problems) could also be useful to external validators.

For the fourth time: Knossos does not make an optimistic assumption. It specifically avoids making assumptions. Please stop claiming otherwise.

You're right, but @johanatan is asking a different question, which calls for a forall rather than exists.

As we know, a history H is linearizable if there exists a linearization compatible with H—hence, if there exists a way to resolve indeterminate operations consistent with the history. In a sense, I'm assuming the system is "innocent" (non-buggy) as long as I can, hence discarding other possible resolutions.

You could ask the converse question: are all resolutions consistent with the history, or is there a resolution that does not satisfy constraints? That seems closer to what @johanatan is asking. Is this different from asking "are there indeterminate operations"? If not, logging indeterminate operations might be the only sensible change.

Blaisorblade commented 7 years ago

Having read https://aphyr.com/posts/309-knossos-redis-and-linearizability, a counter-argument to my last post seems to be: Knossos is meant to be used for model checking, which implies generating tons of histories from a model searching for a counterexample, rather than looking at few histories by hand. I'm not sure whether @johanatan's assumptions are sensible.

aphyr commented 7 years ago

is there a resolution that does not satisfy constraints?

A typical history will have 10^30 illegal linearizations per operation. The answer is, in all but the most trivial cases, yes.

garyxia commented 7 years ago

Kyle, Is your 48 xeon-cores (it's really 24 core, HT, right?) 128 G system powerful enough for multi-register model? We had some single register model tests and they were all fine but multi-register analysis seems to take forever. We have a 24 physical cores 256g system,

aphyr commented 7 years ago

Edit: I had a bunch of screenshots from reddit user 'Johanatan' here, which are pretty awful. Johanatan's twitter is on the same spectrum (trolling feminists, black people, etc), so I assumed they both were his. He says those reddit comments aren't his, and it'd be irresponsible for me to claim a connection, so I've taken that down.

I've provided no shortage of good-faith, uncompensated support so far, but I can't waste any more time on this. I had a couple folks come tell me this week that he trolled their slack channels and they were forced to engage in moderation, and given his behavior so far, I'm inclined to agree. Dude's an asshole, and this is going nowhere productive. I've blocked johanatan, and his alt, and his other alt, and so on.