microsoft / CCF

Confidential Consortium Framework
https://microsoft.github.io/CCF/
Apache License 2.0
766 stars 207 forks source link

Document non-linearizability of read-only transactions in rare system conditions #6167

Closed lemmy closed 1 month ago

lemmy commented 1 month ago

Click here to interactively explore the 12-step counterexample documenting non-linearizability of read-only transactions in rare system conditions.


Related to https://github.com/microsoft/CCF/issues/5636

This PR adds a monolith version of our TLA+ Consistency spec to better document non-linearizability of read-only transactions in pathological system condition. A relevant counterexample can be interactively explored with tla-web. TLC's version of this counterexample follows:

tlc.sh MCMultiNodeReads.tla -config MCMultiNodeReadsNotLinearizable.cfg

TLC2 Version 2.18 of Day Month 20?? (rev: fa4968f)
Running breadth-first search Model-Checking with fp 115 and seed -6022815064121998177 with 1 worker on 10 cores with 7282MB heap and 64MB offheap memory [pid: 47650] (Mac OS X 13.6.6 aarch64, Homebrew 11.0.23 x86_64, MSBDiskFPSet, DiskStateQueue).
[...]
Starting... (2024-05-05 10:32:26)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-05-05 10:32:26.
Progress(11) at 2024-05-05 10:32:29: 468,645 states generated (468,645 s/min), 311,928 distinct states found (311,928 ds/min), 255,202 states left on queue.

Invariant AllCommittedObservedRoInv is violated.
The behavior up to this point is:
1: <Initial predicate>
/\ history = <<>>
/\ ledgerBranches = <<<<>>>>
2: <MCRwTxRequestAction line 9, col 5 to line 10, col 24 of module MCSingleNode>
/\ history = <<[type |-> RwTxRequest, tx |-> 0]>>
/\ ledgerBranches = <<<<>>>>
3: <MCRwTxRequestAction line 9, col 5 to line 10, col 24 of module MCSingleNode>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1]>>
/\ ledgerBranches = <<<<>>>>
4: <RwTxExecuteAction line 55, col 5 to line 67, col 28 of module SingleNode>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1]>>
/\ ledgerBranches = <<<<[tx |-> 0, view |-> 1]>>>>
5: <MCRwTxResponseAction line 16, col 5 to line 17, col 25 of module MCSingleNode>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1], [type |-> RwTxResponse, tx |-> 0, observed |-> <<0>>, tx_id |-> <<1, 1>>]>>
/\ ledgerBranches = <<<<[tx |-> 0, view |-> 1]>>>>
6: <MCStatusCommittedResponseAction line 20, col 5 to line 21, col 36 of module MCSingleNode>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1], [type |-> RwTxResponse, tx |-> 0, observed |-> <<0>>, tx_id |-> <<1, 1>>], [type |-> TxStatusReceived, tx_id |-> <<1, 1>>, status |-> CommittedStatus]>>
/\ ledgerBranches = <<<<[tx |-> 0, view |-> 1]>>>>
7: <MCTruncateLedgerAction line 11, col 5 to line 12, col 27 of module MCMultiNode>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1], [type |-> RwTxResponse, tx |-> 0, observed |-> <<0>>, tx_id |-> <<1, 1>>], [type |-> TxStatusReceived, tx_id |-> <<1, 1>>, status |-> CommittedStatus]>>
/\ ledgerBranches = <<<<[tx |-> 0, view |-> 1]>>, <<[tx |-> 0, view |-> 1]>>>>
8: <RwTxExecuteAction line 55, col 5 to line 67, col 28 of module SingleNode>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1], [type |-> RwTxResponse, tx |-> 0, observed |-> <<0>>, tx_id |-> <<1, 1>>], [type |-> TxStatusReceived, tx_id |-> <<1, 1>>, status |-> CommittedStatus]>>
/\ ledgerBranches = <<<<[tx |-> 0, view |-> 1]>>, <<[tx |-> 0, view |-> 1], [tx |-> 1, view |-> 2]>>>>
9: <MCRwTxResponseAction line 16, col 5 to line 17, col 25 of module MCSingleNode>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1], [type |-> RwTxResponse, tx |-> 0, observed |-> <<0>>, tx_id |-> <<1, 1>>], [type |-> TxStatusReceived, tx_id |-> <<1, 1>>, status |-> CommittedStatus], [type |-> RwTxResponse, tx |-> 1, observed |-> <<0, 1>>, tx_id |-> <<2, 2>>]>>
/\ ledgerBranches = <<<<[tx |-> 0, view |-> 1]>>, <<[tx |-> 0, view |-> 1], [tx |-> 1, view |-> 2]>>>>
10: <MCStatusCommittedResponseAction line 20, col 5 to line 21, col 36 of module MCSingleNode>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1], [type |-> RwTxResponse, tx |-> 0, observed |-> <<0>>, tx_id |-> <<1, 1>>], [type |-> TxStatusReceived, tx_id |-> <<1, 1>>, status |-> CommittedStatus], [type |-> RwTxResponse, tx |-> 1, observed |-> <<0, 1>>, tx_id |-> <<2, 2>>], [type |-> TxStatusReceived, tx_id |-> <<2, 2>>, status |-> CommittedStatus]>>
/\ ledgerBranches = <<<<[tx |-> 0, view |-> 1]>>, <<[tx |-> 0, view |-> 1], [tx |-> 1, view |-> 2]>>>>
11: <MCRoTxRequestAction line 7, col 5 to line 8, col 24 of module MCSingleNodeReads>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1], [type |-> RwTxResponse, tx |-> 0, observed |-> <<0>>, tx_id |-> <<1, 1>>], [type |-> TxStatusReceived, tx_id |-> <<1, 1>>, status |-> CommittedStatus], [type |-> RwTxResponse, tx |-> 1, observed |-> <<0, 1>>, tx_id |-> <<2, 2>>], [type |-> TxStatusReceived, tx_id |-> <<2, 2>>, status |-> CommittedStatus], [type |-> RoTxRequest, tx |-> 2]>>
/\ ledgerBranches = <<<<[tx |-> 0, view |-> 1]>>, <<[tx |-> 0, view |-> 1], [tx |-> 1, view |-> 2]>>>>
12: <MCRoTxResponseAction line 11, col 5 to line 12, col 25 of module MCSingleNodeReads>
/\ history = <<[type |-> RwTxRequest, tx |-> 0], [type |-> RwTxRequest, tx |-> 1], [type |-> RwTxResponse, tx |-> 0, observed |-> <<0>>, tx_id |-> <<1, 1>>], [type |-> TxStatusReceived, tx_id |-> <<1, 1>>, status |-> CommittedStatus], [type |-> RwTxResponse, tx |-> 1, observed |-> <<0, 1>>, tx_id |-> <<2, 2>>], [type |-> TxStatusReceived, tx_id |-> <<2, 2>>, status |-> CommittedStatus], [type |-> RoTxRequest, tx |-> 2], [type |-> RoTxResponse, tx |-> 2, observed |-> <<0>>, tx_id |-> <<1, 1>>]>>
/\ ledgerBranches = <<<<[tx |-> 0, view |-> 1]>>, <<[tx |-> 0, view |-> 1], [tx |-> 1, view |-> 2]>>>>

Progress(12) at 2024-05-05 10:32:40: 2,362,986 states generated (9,753,657 s/min), 1,468,921 distinct states found (6,063,240 ds/min), 1,149,951 states left on queue.
2362986 states generated, 1468921 distinct states found, 1149951 states left on queue.
The depth of the complete state graph search is 12.
Finished in 14538ms at (2024-05-05 10:32:40)

Originally discovered by @heidihoward

heidihoward commented 1 month ago

Note this counterexample is also produced in the CI by https://github.com/microsoft/CCF/blob/main/tla/consistency/MCMultiNodeReadsNotLinearizable.cfg

lemmy commented 1 month ago

I tried to make the original specs work with tla-web for a few hours yesterday. At the end, Will confirmed that "modules really just aren't implemented fully yet, so [a bogus behavior of tla-web is] likely a bug related to that."

For now, we will have to custom-tailor a spec for tla-web. This includes:

achamayou commented 1 month ago

For now, we will have to custom-tailor a spec for tla-web.

If you're happy to maintain it, that's fine. But we are not able to maintain it in its current state. If we can't find a way to automatically keep it in sync, there's a high chance we'll have to remove it when it gets stale.