microsoft / CCF

Confidential Consortium Framework
https://microsoft.github.io/CCF/
Apache License 2.0
761 stars 205 forks source link

Fix read backfill off-by-one #6153

Closed achamayou closed 2 weeks ago

achamayou commented 3 weeks ago

Stumbled across this error while investigating the timeout in https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=84142&view=results

I incorrectly assumed the backfill logic was the same for Rw and Ro, but of course for Ro we need extend to the seqno itself, whereas for Rw it's to seqno - 1, since seqno is the write itself. Duh.

Does not have impact on the timeout.

lemmy commented 3 weeks ago
TLC2 Version 2.18 of Day Month 20?? (rev: 03c7bf4)
Running breadth-first search Model-Checking with fp 52 and seed -4373866203789575921 with 1 worker on 10 cores with 7282MB heap and 64MB offheap memory [pid: 63986] (Mac OS X 13.6.6 aarch64, Homebrew 11.0.23 x86_64, MSBDiskFPSet, DiskStateQueue).
[...]
Starting... (2024-04-26 13:14:12)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-04-26 13:14:13.
<<[worker |-> 0, queue |-> 43400, duration |-> 3, generated |-> 104551, diameter |-> 73, distinct |-> 104551, initial |-> 1], 43>>
Progress(73) at 2024-04-26 13:14:16: 104,551 states generated (104,551 s/min), 104,551 distinct states found (104,551 ds/min), 43,399 states left on queue.
<<[worker |-> 0, queue |-> 535322, duration |-> 63, generated |-> 1687602, diameter |-> 83, distinct |-> 1687602, initial |-> 1], 52>>
Progress(83) at 2024-04-26 13:15:16: 1,687,602 states generated (1,583,051 s/min), 1,687,602 distinct states found (1,583,051 ds/min), 535,321 states left on queue.
<<[worker |-> 0, queue |-> 236196, duration |-> 123, generated |-> 2569378, diameter |-> 87, distinct |-> 2569378, initial |-> 1], 55>>
Progress(87) at 2024-04-26 13:16:16: 2,569,378 states generated (881,776 s/min), 2,569,378 distinct states found (881,776 ds/min), 236,195 states left on queue.
<<[worker |-> 0, queue |-> 708588, duration |-> 183, generated |-> 4164066, diameter |-> 89, distinct |-> 4164066, initial |-> 1], 56>>
Progress(89) at 2024-04-26 13:17:16: 4,164,066 states generated (1,594,688 s/min), 4,164,066 distinct states found (1,594,688 ds/min), 708,587 states left on queue.
<<[worker |-> 0, queue |-> 1661758, duration |-> 243, generated |-> 5628880, diameter |-> 90, distinct |-> 5628880, initial |-> 1], 57>>
Progress(90) at 2024-04-26 13:18:16: 5,628,880 states generated (1,464,814 s/min), 5,628,880 distinct states found (1,464,814 ds/min), 1,661,757 states left on queue.
<<[worker |-> 0, queue |-> 2125764, duration |-> 303, generated |-> 7570193, diameter |-> 91, distinct |-> 7570193, initial |-> 1], 58>>
Progress(91) at 2024-04-26 13:19:16: 7,570,193 states generated (1,941,313 s/min), 7,570,193 distinct states found (1,941,313 ds/min), 2,125,763 states left on queue.
<<[worker |-> 0, queue |-> 2840888, duration |-> 433, generated |-> 9523339, diameter |-> 92, distinct |-> 9523339, initial |-> 1], 59>>
Progress(92) at 2024-04-26 13:21:26: 9,523,339 states generated (1,953,146 s/min), 9,523,339 distinct states found (1,953,146 ds/min), 2,840,887 states left on queue.
<<[worker |-> 0, queue |-> 4002736, duration |-> 551, generated |-> 11266111, diameter |-> 92, distinct |-> 11266111, initial |-> 1], 59>>
Progress(92) at 2024-04-26 13:23:24: 11,266,111 states generated (1,742,772 s/min), 11,266,111 distinct states found (1,742,772 ds/min), 4,002,735 states left on queue.
<<[worker |-> 0, queue |-> 5150410, duration |-> 1459, generated |-> 12987622, diameter |-> 92, distinct |-> 12987622, initial |-> 1], 59>>
Progress(92) at 2024-04-26 13:38:31: 12,987,622 states generated (1,721,511 s/min), 12,987,622 distinct states found (1,721,511 ds/min), 5,150,409 states left on queue.
<<[worker |-> 0, queue |-> 6328258, duration |-> 1519, generated |-> 14754394, diameter |-> 92, distinct |-> 14754394, initial |-> 1], 59>>
Progress(92) at 2024-04-26 13:39:31: 14,754,394 states generated (1,766,772 s/min), 14,754,394 distinct states found (1,766,772 ds/min), 6,328,257 states left on queue.
<<[worker |-> 0, queue |-> 5704404, duration |-> 1579, generated |-> 15164388, diameter |-> 93, distinct |-> 15164388, initial |-> 1], 60>>
Progress(93) at 2024-04-26 13:40:31: 15,164,388 states generated (409,994 s/min), 15,164,388 distinct states found (409,994 ds/min), 5,704,403 states left on queue.
<<[worker |-> 0, queue |-> 5004835, duration |-> 1639, generated |-> 15514173, diameter |-> 93, distinct |-> 15514173, initial |-> 1], 60>>
Progress(93) at 2024-04-26 13:41:31: 15,514,173 states generated (349,785 s/min), 15,514,173 distinct states found (349,785 ds/min), 5,004,834 states left on queue.
<<[worker |-> 0, queue |-> 4304776, duration |-> 1699, generated |-> 15864203, diameter |-> 93, distinct |-> 15864203, initial |-> 1], 60>>
Progress(93) at 2024-04-26 13:42:31: 15,864,203 states generated (350,030 s/min), 15,864,203 distinct states found (350,030 ds/min), 4,304,775 states left on queue.
<<[worker |-> 0, queue |-> 3603538, duration |-> 1759, generated |-> 16214822, diameter |-> 93, distinct |-> 16214822, initial |-> 1], 60>>
Progress(93) at 2024-04-26 13:43:31: 16,214,822 states generated (350,619 s/min), 16,214,822 distinct states found (350,619 ds/min), 3,603,538 states left on queue.
<<[worker |-> 0, queue |-> 2899986, duration |-> 1819, generated |-> 16566597, diameter |-> 93, distinct |-> 16566597, initial |-> 1], 60>>
Checkpointing of run states/24-04-26-13-14-12.485
Checkpointing completed at (2024-04-26 13:44:32)
Progress(93) at 2024-04-26 13:44:32: 16,566,880 states generated (352,058 s/min), 16,566,880 distinct states found (352,058 ds/min), 2,899,421 states left on queue.
<<[worker |-> 0, queue |-> 2198120, duration |-> 1879, generated |-> 16917530, diameter |-> 93, distinct |-> 16917530, initial |-> 1], 60>>
Progress(93) at 2024-04-26 13:45:32: 16,917,530 states generated (350,650 s/min), 16,917,530 distinct states found (350,650 ds/min), 2,198,119 states left on queue.
<<[worker |-> 0, queue |-> 2125764, duration |-> 1939, generated |-> 17793056, diameter |-> 94, distinct |-> 17793056, initial |-> 1], 61>>
Progress(94) at 2024-04-26 13:46:32: 17,793,056 states generated (875,526 s/min), 17,793,056 distinct states found (875,526 ds/min), 2,125,763 states left on queue.
<<[worker |-> 0, queue |-> 2125764, duration |-> 1999, generated |-> 18724335, diameter |-> 94, distinct |-> 18724335, initial |-> 1], 61>>
Progress(94) at 2024-04-26 13:47:32: 18,724,335 states generated (931,279 s/min), 18,724,335 distinct states found (931,279 ds/min), 2,125,763 states left on queue.
<<[worker |-> 0, queue |-> 2125764, duration |-> 2059, generated |-> 20494338, diameter |-> 95, distinct |-> 20494338, initial |-> 1], 62>>
Progress(95) at 2024-04-26 13:48:32: 20,494,338 states generated (1,770,003 s/min), 20,494,338 distinct states found (1,770,003 ds/min), 2,125,763 states left on queue.
<<[worker |-> 0, queue |-> 2125764, duration |-> 2119, generated |-> 21654963, diameter |-> 96, distinct |-> 21654963, initial |-> 1], 63>>
Progress(96) at 2024-04-26 13:49:32: 21,654,963 states generated (1,160,625 s/min), 21,654,963 distinct states found (1,160,625 ds/min), 2,125,763 states left on queue.
<<[worker |-> 0, queue |-> 2125764, duration |-> 2179, generated |-> 22312114, diameter |-> 96, distinct |-> 22312114, initial |-> 1], 63>>
Progress(96) at 2024-04-26 13:50:32: 22,312,114 states generated (657,151 s/min), 22,312,114 distinct states found (657,151 ds/min), 2,125,763 states left on queue.
<<[worker |-> 0, queue |-> 2125764, duration |-> 2239, generated |-> 22965467, diameter |-> 96, distinct |-> 22965467, initial |-> 1], 63>>
Progress(96) at 2024-04-26 13:51:32: 22,965,467 states generated (653,353 s/min), 22,965,467 distinct states found (653,353 ds/min), 2,125,763 states left on queue.
<<[worker |-> 0, queue |-> 2515740, duration |-> 2299, generated |-> 23915965, diameter |-> 97, distinct |-> 23915965, initial |-> 1], 63>>
Progress(97) at 2024-04-26 13:52:32: 23,915,965 states generated (950,498 s/min), 23,915,965 distinct states found (950,498 ds/min), 2,515,739 states left on queue.
<<[worker |-> 0, queue |-> 3390930, duration |-> 2359, generated |-> 25228750, diameter |-> 97, distinct |-> 25228750, initial |-> 1], 63>>
Progress(97) at 2024-04-26 13:53:32: 25,228,750 states generated (1,312,785 s/min), 25,228,750 distinct states found (1,312,785 ds/min), 3,390,929 states left on queue.
<<[worker |-> 0, queue |-> 4264002, duration |-> 2419, generated |-> 26538358, diameter |-> 97, distinct |-> 26538358, initial |-> 1], 63>>
Progress(97) at 2024-04-26 13:54:32: 26,538,358 states generated (1,309,608 s/min), 26,538,358 distinct states found (1,309,608 ds/min), 4,264,001 states left on queue.
<<[worker |-> 0, queue |-> 5139314, duration |-> 2479, generated |-> 27851326, diameter |-> 97, distinct |-> 27851326, initial |-> 1], 63>>
Progress(97) at 2024-04-26 13:55:32: 27,851,326 states generated (1,312,968 s/min), 27,851,326 distinct states found (1,312,968 ds/min), 5,139,313 states left on queue.
<<[worker |-> 0, queue |-> 6015222, duration |-> 2539, generated |-> 29165188, diameter |-> 97, distinct |-> 29165188, initial |-> 1], 63>>
Progress(97) at 2024-04-26 13:56:32: 29,165,188 states generated (1,313,862 s/min), 29,165,188 distinct states found (1,313,862 ds/min), 6,015,221 states left on queue.
<<[worker |-> 0, queue |-> 6377292, duration |-> 2599, generated |-> 31041881, diameter |-> 98, distinct |-> 31041881, initial |-> 1], 64>>
Progress(98) at 2024-04-26 13:57:32: 31,041,881 states generated (1,876,693 s/min), 31,041,881 distinct states found (1,876,693 ds/min), 6,377,291 states left on queue.
<<[worker |-> 0, queue |-> 6377292, duration |-> 2659, generated |-> 33297915, diameter |-> 98, distinct |-> 33297915, initial |-> 1], 64>>
Progress(98) at 2024-04-26 13:58:32: 33,297,915 states generated (2,256,034 s/min), 33,297,915 distinct states found (2,256,034 ds/min), 6,377,291 states left on queue.
<<[worker |-> 0, queue |-> 6377292, duration |-> 2719, generated |-> 35534670, diameter |-> 98, distinct |-> 35534670, initial |-> 1], 64>>
Progress(98) at 2024-04-26 13:59:32: 35,534,670 states generated (2,236,755 s/min), 35,534,670 distinct states found (2,236,755 ds/min), 6,377,291 states left on queue.
Progress(99) at 2024-04-26 13:59:46: 36,085,587 states generated (550,917 s/min), 36,085,586 distinct states found (550,916 ds/min), 6,377,292 states left on queue.
Progress(99) at 2024-04-26 13:59:46: 36,085,587 states generated (0 s/min), 36,085,587 distinct states found (1 ds/min), 6,377,293 states left on queue.
Progress(99) at 2024-04-26 13:59:46: 36,085,588 states generated (1 s/min), 36,085,588 distinct states found (1 ds/min), 6,377,294 states left on queue.
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 0.0
36085588 states generated, 36085588 distinct states found, 6377294 states left on queue.
The depth of the complete state graph search is 99.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 3 and the 95th percentile is 3).
Finished in 45min 35s at (2024-04-26 13:59:48)
lemmy commented 3 weeks ago

Trace is validated in no time with DFS :-)

markus@avocado [14:30:48] [~/src/TLA/_specs/MSFT/CCF/tla/consistency] [fa4029338]
-> % export TLC_OPTS="-Dtlc2.tool.queue.IStateQueue=StateDeque"

markus@avocado [14:30:52] [~/src/TLA/_specs/MSFT/CCF/tla/consistency] [fa4029338]
-> % tlc -note TraceMultiNodeReads -checkpoint 0               
TLC2 Version 2.18 of Day Month 20?? (rev: 03c7bf4)
Running breadth-first search Model-Checking with fp 6 and seed 1941814660562527685 with 1 worker on 10 cores with 7282MB heap and 64MB offheap memory [pid: 67804] (Mac OS X 13.6.6 aarch64, Homebrew 11.0.23 x86_64, MSBDiskFPSet, StateDeque).
[...]
Starting... (2024-04-26 14:30:54)
Implied-temporal checking--satisfiability problem has 1 branches.
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2024-04-26 14:30:55.
Progress(99) at 2024-04-26 14:30:55: 136 states generated (136 s/min), 136 distinct states found (136 ds/min), 37 states left on queue.
Progress(99) at 2024-04-26 14:30:55: 137 states generated (1 s/min), 137 distinct states found (1 ds/min), 38 states left on queue.
Progress(99) at 2024-04-26 14:30:55: 138 states generated (1 s/min), 138 distinct states found (1 ds/min), 39 states left on queue.
Progress(99) at 2024-04-26 14:30:55: 138 states generated, 138 distinct states found, 40 states left on queue.
Checking temporal properties for the complete state space with 138 total distinct states at (2024-04-26 14:30:55)
Finished checking temporal properties in 00s at 2024-04-26 14:30:55
Model checking completed. No error has been found.
  Estimates of the probability that TLC did not check all reachable states
  because two distinct states had the same fingerprint:
  calculated (optimistic):  val = 0.0
138 states generated, 138 distinct states found, 40 states left on queue.
The depth of the complete state graph search is 99.
The average outdegree of the complete state graph is 1 (minimum is 1, the maximum 3 and the 95th percentile is 3).
Finished in 00s at (2024-04-26 14:30:55)
lemmy commented 3 weeks ago

The state-space exploration stems from the non-determinism in RoTxResponseAction s.t. a read may be served from any ledger branch (replica?). A later read tx may be served from an older branch, and, thus, TLC has to construct all permutations of the ledger branches (observed).

lemmy commented 2 weeks ago

Moved to https://github.com/microsoft/CCF/pull/6154

achamayou commented 2 weeks ago

The state-space exploration stems from the non-determinism in RoTxResponseAction s.t. a read may be served from any ledger branch (replica?). A later read tx may be served from an older branch, and, thus, TLC has to construct all permutations of the ledger branches (observed).

Ah that's a very good point, I didn't think of observed, I only thought that the txids would constrain the branching (they do), and so there should be a single linear history. But you're right that observed varies with the seqno!