Closed achamayou closed 2 months ago
As @eddyashton pointed out, to work around the limitations of timestamps, we want to initially focus on the sequentially executing raft_scenarios.
The table shows the TLA+ actions and the updated (non-auxiliary) variables of ccfraft.tla. It indicates what state the trace locations have to be emitted.
Action | state | matchIndex' | messages' | log' | state' | configurations' | commitIndex' | commitsNotified' | currentTerm' | votedFor' | votesGranted' |
---|---|---|---|---|---|---|---|---|---|---|---|
Timeout | Follower, Candidate | x (Candidate) | x | x | x | ||||||
ChangeConfiguration | Leader | x | x | ||||||||
CheckQuorum | Leader | x (Follower) | |||||||||
ClientRequest | Leader | x (TypeEntry) | |||||||||
BecomeLeader | Candidate | x | x | x | x | ||||||
SignCommittableMessages | Leader | x (TypeSignature) | |||||||||
AdvanceCommitIndex | Leader | x | x | x | |||||||
RequestVote | Candidate | snd | |||||||||
NotifyCommit | RetiredLeader | snd | x | ||||||||
AppendEntries | Leader | snd | |||||||||
RcvDropIgnoredMessage | any | rcv | |||||||||
RcvUpdateTerm | any | rcv | x | x | x | ||||||
RcvAppendEntriesRequest | any | rcv | x | x | x | x | |||||
RcvRequestVoteRequest | any | rcv | x | ||||||||
RcvRequestVoteResponse | any | rcv | x | ||||||||
RcvAppendEntriesResponse | any | x | rcv | ||||||||
RcvUpdateCommitIndex | any | rcv | x | x |
@eddyashton suggested that post-processing the existing logging with a Python script is a better approach because it can also be used in production. This raises the question of whether debug and trace log statements can be included in trace validation, as they may not be included in a production build.
Started hacking at https://github.com/lemmy/CCF/tree/mku-gh5057
In the tests/raft_scenarios/4582.2
file, there is an issue where raft.h
allows a pending
node, named n1
, to become a follower
in response to receiving an AppendEntries
message. However, this transition occurs even if n1
's log is not up-to-date and is missing the new configuration that includes n1
. Consequently, n1
responds to the message with an AppendEntriesResponseType::FAIL
.
This behavior is not consistent with ccfraft.tla
, which specifies that a pending
node n1 can transition to a follower
state iff log[n1]
contains a configuration that includes n1
as a member.
Independently, ccfraft.tla
does not model AppendEntries to contain multiple log entires. For trace validation, this might have to change.
tests/raft_scenarios/replicate
reveals that ccfraft.tla
does not directly model the term update when a RaftMsgType::raft_request_vote
message's term is greater than the node's current term:
https://github.com/microsoft/CCF/blob/main/src/consensus/aft/raft.h#L1716-L1725 vs https://github.com/microsoft/CCF/blob/main/tla/ccfraft.tla#L679
Note that UpdateTerm
does not remove a message from messages
and that become_aware_of_new_term is the equivalent to the spec's UpdateTerm action. Thus, the spec models the above scenario by an UpdateTerm
action followed by a RcvRequestVoteRequest
action.
tests/raft_scenarios/replicate
reveals thatccfraft.tla
does not directly model the term update when aRaftMsgType::raft_request_vote
message's term is greater than the node's current term:https://github.com/microsoft/CCF/blob/main/src/consensus/aft/raft.h#L1716-L1725 vs https://github.com/microsoft/CCF/blob/main/tla/ccfraft.tla#L679
Note that
UpdateTerm
does not remove a message frommessages
and that become_aware_of_new_term is the equivalent to the spec's UpdateTerm action. Thus, the spec models the above scenario by anUpdateTerm
action followed by aRcvRequestVoteRequest
action.
Addressed with the help of https://github.com/tlaplus/tlaplus/tree/mku-cdot in https://github.com/lemmy/CCF/commit/4d253100a6b53c6729060747bc08bac9bd35ae7e
The replicate scenario in the tests/raft_scenarios/replicate
folder of the CCF repository exposes an issue in the ccfraft.tla
file, specifically in line 457. This problem prevents the newly elected leader from sending an empty AppendEntries
message after its election, as it is supposed to do. Currently, the only relevant enabled action in ccfraft.tla is ClientRequest
, which advances the leader's log beyond nextIndex
and enables AppendEntries
. However, waiting for the first client request before announcing the election is not correct.
The wording "Send up to 1 entry, constrained by the end of the log." of ccfraft!AppendEntries
indicates that it should be possible to send an empty AppendEntries
message. Removing line 457 does not cause any violations in the simulation (SIMccraft
) or exhaustive model checking (MCccfraft
and MCccraftWithReconfig
) but increases the specification's state space.
Depends on https://github.com/tlaplus/tlaplus/pull/805
Here is one more discrepancy found by partially matching tests/raft_scenarios/4582.2
. Action ccfraft!NoConflictAppendEntriesRequest
does not permit a node n
to become a follower if n
is not a member of any configuration. Yet, the implementation permits node 1 to become a follower even though the node is not a member of any configuration (note "configurations":[]
in corresponding log statement below obtained with https://github.com/lemmy/CCF/commit/fdf714070f9b5e97e0a65dc0b9a3edaa13f10ec1).
{"h_ts":"2023-04-03T18:31:13.652461Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"2165","msg":{"configurations":[],"event":{"component":"raft","function":"become_follower"},"leadership":"Follower","membership":"Active","node":"1","oldleadership":"none","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"1"}}}
Here is one more discrepancy found by partially matching
tests/raft_scenarios/4582.2
. Actionccfraft!NoConflictAppendEntriesRequest
does not permit a noden
to become a follower ifn
is not a member of any configuration. Yet, the implementation permits node 1 to become a follower even though the node is not a member of any configuration (note"configurations":[]
in corresponding log statement below obtained with lemmy@fdf7140).{"h_ts":"2023-04-03T18:31:13.652461Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"2165","msg":{"configurations":[],"event":{"component":"raft","function":"become_follower"},"leadership":"Follower","membership":"Active","node":"1","oldleadership":"none","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"1"}}}
I think the implementation behaviour is correct and necessary here. A node which joins without a snapshot starts with no knowledge of configurations, yet must be able to become Follower
(or init as a Follower
) and process incoming AppendEntries. Some other node considers them part of a configuration, and needs to bring them up to speed.
This directly contradicts a comment in ccfraft.tla
:
A node's configuration is never "empty"
I think we need to update the spec to match the implementation. Either permit the change to Follower
when the configuration is locally-unknown, or add an earlier transition to Follower
(which is what happens in practice). I wonder what the model thinks happens here, because I guess its maybe possible that once you've become follower, you become candidate and elect yourself primary, with an empty KV?
I believe we don't currently model joining from a snapshot, where the comment is true (after parsing a valid snapshot, a node has at least some configuration, likely stale).
Heads up: it seems that the trace validation has covered most of the "happy path", as yesterday's successful validation of tests/raft_scenarios/election
right out of the box!
election_while_reconfiguration
(and reconfiguration
) also detects the spec<>code divergence found in https://github.com/microsoft/CCF/issues/5057#issuecomment-1494791990.
{"h_ts":"2023-04-06T18:01:30.548291Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"522","msg":{"configurations":[{"idx":0,"nodes":{"0":{"address":":"}},"rid":0}],"event":{"component":"raft","function":"add_configuration"},"leadership":"none","membership":"Active","node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":0,"last_idx":0,"my_node_id":"0"}}}
{"h_ts":"2023-04-06T18:01:30.548655Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"2031","msg":{"configurations":[{"idx":0,"nodes":{"0":{"address":":"}},"rid":0}],"event":{"component":"raft","function":"become_candidate"},"leadership":"Candidate","membership":"Active","node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"0"}}}
{"h_ts":"2023-04-06T18:01:30.548842Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"2099","msg":{"configurations":[{"idx":0,"nodes":{"0":{"address":":"}},"rid":0}],"event":{"component":"raft","function":"become_leader"},"leadership":"Leader","membership":"Active","node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"0"}}}
AppendEntries
followed by a non-empty AppendEntries
message:{"h_ts":"2023-04-06T18:01:30.549822Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"522","msg":{"configurations":[{"idx":0,"nodes":{"0":{"address":":"}},"rid":0},{"idx":1,"nodes":{"0":{"address":":"},"1":{"address":":"}},"rid":1}],"event":{"component":"raft","function":"add_configuration"},"leadership":"Leader","membership":"Active","node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"0"}}}
{"h_ts":"2023-04-06T18:01:30.549998Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1255","msg":{"event":{"component":"raft","function":"send_authenticated"},"leadership":"Leader","membership":"Active","mentries":0,"node":"0","paket":{"contains_new_view":false,"leader_commit_idx":0,"msg":0,"prev_term":0,"term":1,"term_of_idx":0},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"0"},"to":"1","type":1}}
{"h_ts":"2023-04-06T18:01:30.550149Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/test/driver.h","number":"62","msg":{"data":"eyJkYXRhIjoiZXlJd0lqcDdJbUZrWkhKbGMzTWlPaUk2SW4wc0lqRWlPbnNpWVdSa2NtVnpjeUk2SWpvaWZYMD0iLCJ0eXBlIjoicmVjb25maWd1cmF0aW9uIn0=","event":{"component":"ledger","function":"append"},"index":1,"node":"0","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":1,"my_node_id":"0"},"term":1}}
{"h_ts":"2023-04-06T18:01:30.550384Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1255","msg":{"event":{"component":"raft","function":"send_authenticated"},"leadership":"Leader","membership":"Active","mentries":1,"node":"0","paket":{"contains_new_view":false,"leader_commit_idx":0,"msg":0,"prev_term":0,"term":1,"term_of_idx":1},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":1,"my_node_id":"0"},"to":"1","type":1}}
AppendEntries
and transition to follower
state:{"h_ts":"2023-04-06T18:01:30.550543Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"923","msg":{"event":{"component":"raft","function":"recv_append_entries"},"from":"0","leadership":"none","membership":"Active","node":"1","paket":{"contains_new_view":false,"leader_commit_idx":0,"msg":0,"prev_term":0,"term":1,"term_of_idx":0},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":0,"last_idx":0,"my_node_id":"1"},"type":0}}
{"h_ts":"2023-04-06T18:01:30.550700Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/test/driver.h","number":"73","msg":{"event":{"component":"ledger","function":"truncate"},"index":0,"node":"1","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"1"}}}
{"h_ts":"2023-04-06T18:01:30.550852Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"2166","msg":{"configurations":[],"event":{"component":"raft","function":"become_follower"},"leadership":"Follower","membership":"Active","node":"1","oldleadership":"none","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"1"}}}
{"h_ts":"2023-04-06T18:01:30.551132Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1647","msg":{"event":{"component":"raft","function":"send_append_entries_response"},"leadership":"Follower","membership":"Active","node":"1","paket":{"last_log_idx":0,"msg":1,"success":0,"term":1},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"1"},"to":"0","type":1}}
AppendEntries
:{"h_ts":"2023-04-06T18:01:30.551263Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"923","msg":{"event":{"component":"raft","function":"recv_append_entries"},"from":"0","leadership":"Follower","membership":"Active","node":"1","paket":{"contains_new_view":false,"leader_commit_idx":0,"msg":0,"prev_term":0,"term":1,"term_of_idx":1},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":0,"my_node_id":"1"},"type":0}}
{"h_ts":"2023-04-06T18:01:30.551568Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"522","msg":{"configurations":[{"idx":1,"nodes":{"0":{"address":":"},"1":{"address":":"}},"rid":1}],"event":{"component":"raft","function":"add_configuration"},"leadership":"Follower","membership":"Active","node":"1","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":1,"my_node_id":"1"}}}
{"h_ts":"2023-04-06T18:01:30.551711Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/test/driver.h","number":"62","msg":{"data":"eyJkYXRhIjoiZXlJd0lqcDdJbUZrWkhKbGMzTWlPaUk2SW4wc0lqRWlPbnNpWVdSa2NtVnpjeUk2SWpvaWZYMD0iLCJ0eXBlIjoicmVjb25maWd1cmF0aW9uIn0=","event":{"component":"ledger","function":"append"},"index":1,"node":"1","state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":1,"my_node_id":"1"},"term":1}}
{"h_ts":"2023-04-06T18:01:30.551962Z","thread_id":"100","level":"info","tag":"tla","file":"../src/consensus/aft/raft.h","number":"1647","msg":{"event":{"component":"raft","function":"send_append_entries_response"},"leadership":"Follower","membership":"Active","node":"1","paket":{"last_log_idx":1,"msg":1,"success":0,"term":1},"state":{"cft_watermark_idx":0,"commit_idx":0,"current_view":1,"last_idx":1,"my_node_id":"1"},"to":"0","type":1}}
...
The problem is that "become_follower"
log event is triggered by the first, empty message, whereas the "add_configuration"
event by the follower is prompted by the subsequent message. Removing the conjunct (https://github.com/microsoft/CCF/blob/main/tla/ccfraft.tla#L796) that prevents a pending
node from becoming follower
despite its configurations
does not fully address the problem, because ccfraft!InitReconfigurationVars
does not accept initial states with an empty configuration. Related:
This directly contradicts a comment in
ccfraft.tla
:A node's configuration is never "empty"
I think we need to update the spec to match the implementation. Either permit the change to
Follower
when the configuration is locally-unknown, or add an earlier transition toFollower
(which is what happens in practice). I wonder what the model thinks happens here, because I guess its maybe possible that once you've become follower, you become candidate and elect yourself primary, with an empty KV?I believe we don't currently model joining from a snapshot, where the comment is true (after parsing a valid snapshot, a node has at least some configuration, likely stale).
~The requirement to prevent a node from becoming a follower without a configuration was introduced in https://github.com/microsoft/CCF/issues/4806 to keep it from becoming a primary/leader.~
Fully matched tests/raft_scenarios/check_quorum
, tests/raft_scenarios/reconnect
, and tests/raft_scenarios/reconnect_node
.
The requirement to prevent a node from becoming a follower without a configuration was introduced in https://github.com/microsoft/CCF/issues/4806 to keep it from becoming a primary/leader.
I don't think I understand the logical connection between preventing a node from becoming a follower without a config to becoming a leader without config. The latter is impossible in a real system, and should be impossible with the driver too. The former is necessary for every new joiner node (ie. every node except for the start node, which has a hardcoded config of itself).
A real joiner will either:
In either case, upon seeing an AppendEntries, the new joiner needs to transition to Follower
and cannot expect to know the current primary.
The requirement to prevent a node from becoming a follower without a configuration was introduced in #4806 to keep it from becoming a primary/leader.
I don't think I understand the logical connection between preventing a node from becoming a follower without a config to becoming a leader without config. The latter is impossible in a real system, and should be impossible with the driver too. The former is necessary for every new joiner node (ie. every node except for the start node, which has a hardcoded config of itself).
A real joiner will either:
- Have a stale config from a snapshot.
- Have no config at all.
In either case, upon seeing an AppendEntries, the new joiner needs to transition to
Follower
and cannot expect to know the current primary.
Thanks for the correction. I updated my previous comment.
Fully matched tests/raft_scenarios/bad_network
.
With a strengthened matching of AppendEntriesRequest
and AppendEntriesResponse
messages (below), tests/raft_scenario/bad_network
no longer validates. The problem is that the implementation's prev_idx
increments after messages are dropped and resend (line 30-33), while ccfraft.tla
does not seem to even model sent_idx
.
IsAppendEntriesRequest(msg, dst, src, logline) ==
/\ msg.mtype = AppendEntriesRequest
/\ msg.mtype = RaftMsgType[logline.msg.paket.msg + 1]
/\ msg.mdest = dst
/\ msg.msource = src
/\ msg.mterm = logline.msg.paket.term
/\ msg.mcommitIndex = logline.msg.paket.leader_commit_idx
/\ msg.mprevLogTerm = logline.msg.paket.prev_term
/\ Len(msg.mentries) = logline.msg.paket.idx - logline.msg.paket.prev_idx
/\ msg.mprevLogIndex = logline.msg.paket.prev_idx \* < NEW
IsAppendEntriesResponse(msg, dst, src, logline) ==
/\ msg.mtype = AppendEntriesResponse
/\ msg.mtype = RaftMsgType[logline.msg.paket.msg + 1]
/\ msg.mdest = dst
/\ msg.msource = src
/\ msg.mterm = logline.msg.paket.term
\* raft_types.h enum AppendEntriesResponseType
/\ msg.msuccess = (logline.msg.paket.success = 0) \* < NEW
/\ msg.mmatchIndex = logline.msg.paket.last_log_idx \* < NEW
A partial fix to the problem provides the following patch:
diff --git a/tla/ccfraft.tla b/tla/ccfraft.tla
index 1f8a1abd9..c7e0823fa 100644
--- a/tla/ccfraft.tla
+++ b/tla/ccfraft.tla
@@ -508,7 +508,11 @@ AppendEntries(i, j) ==
/\ \E b \in AppendEntriesBatchsize(i, j):
/\ InMessagesLimit(i, j, b)
/\ Send(msg(b))
- /\ UNCHANGED <<reconfigurationVars, commitsNotified, serverVars, candidateVars, leaderVars, logVars>>
+ \* Record the most recent index we have sent to this node.
+ \* (see https://github.com/microsoft/CCF/blob/main/src/consensus/aft/raft.h#L968)
+ \* Inc by 1 to account for the fact that we send the next index.
+ /\ nextIndex' = [nextIndex EXCEPT ![i][j] = max(@, lastEntry(b) + 1)]
+ /\ UNCHANGED <<reconfigurationVars, commitsNotified, serverVars, candidateVars, matchIndex, logVars>>
\* Candidate i transitions to leader.
BecomeLeader(i) ==
@@ -740,7 +744,7 @@ RejectAppendEntriesRequest(i, j, m, logOk) ==
/\ Reply([mtype |-> AppendEntriesResponse,
mterm |-> currentTerm[i],
msuccess |-> FALSE,
- mmatchIndex |-> 0,
+ mmatchIndex |-> m.mcommitIndex, \* See response_idx of send_append_entries_response in raft.h (https://github.com/microsoft/CCF/blob/main/src/consensus/aft/raft.h#L1321)
msource |-> i,
mdest |-> j],
m)
@@ -861,7 +865,7 @@ HandleAppendEntriesResponse(i, j, m) ==
/\ matchIndex' = [matchIndex EXCEPT ![i][j] = m.mmatchIndex]
\/ /\ \lnot m.msuccess \* not successful
/\ nextIndex' = [nextIndex EXCEPT ![i][j] =
- Max({nextIndex[i][j] - 1, 1})]
+ Max({m.mmatchIndex, matchIndex[i][j]}) + 1]
/\ UNCHANGED matchIndex
/\ Discard(m)
/\ UNCHANGED <<reconfigurationVars, messagesSent, commitsNotified, serverVars, candidateVars, logVars>>
Related PRs:
https://github.com/microsoft/CCF/issues/5676 https://github.com/microsoft/CCF/pull/5250 https://github.com/microsoft/CCF/pull/5250 https://github.com/microsoft/CCF/pull/5238 https://github.com/microsoft/CCF/pull/5233 https://github.com/microsoft/CCF/pull/5222 https://github.com/microsoft/CCF/pull/5201 https://github.com/microsoft/CCF/pull/5200 https://github.com/microsoft/CCF/pull/5193 https://github.com/microsoft/CCF/pull/5187 https://github.com/microsoft/CCF/pull/5154 https://github.com/microsoft/CCF/pull/5150 https://github.com/microsoft/CCF/issues/4473
(bookkeeping)
Further progress blocked by https://github.com/microsoft/CCF/issues/5220
I suspect the next steps after that will consist of:
It's unlikely that we will look at validation of end to end scenarios before we have completed these two items.
2. Improving the performance of the model checking to be able to run it with more interesting constraints
In-progress and pending work on TLC is tracked at https://github.com/tlaplus/tlaplus/pull/867
Now that #5919 is in, I was able to move one of the last two existing scenarios left that needed retirement in #5967.
is_new_follower
in the spec). The new candidate viability test in #5935 is pending the TLC change described above.Post-liveness fix, or perhaps as part of it, we should expand reconfiguration scenario coverage. I think that will bring us to a point where we have fairly decent trace validation, and we will want to look at model checking more thoroughly again.
All of the above except #5935 is now complete, validation work has moved to the following two issues:
This is not currently thought to require spec changes, because the approach adopted is to remove eager rollback and leave the NACK divergence with Raft unchanged.
I've removed end to end trace validation from this, let's plan and track it separately.
We have gotten to the point where all the scenarios pass in CI, including reconfiguration. We've gone from 15 scenarios (~1000 lines) when we opened this ticket (using a substantially different and simplified reconfiguration mechanism) to 28 now (~2300 lines) using realistic reconfiguration. With #6016 and #5973 we have also got to a point where we have no known issues in the implementation (*).
While it would be interesting to get formal coverage information from the spec, and there will always be opportunities to add new scenarios, I believe the current coverage is good enough to call this complete. I think we should now look at optimising the spec to improve model checking coverage.
(*) Which isn't to say we won't find more, or that there aren't optimisation opportunities.
Following a suggestion from @lemmy, we could set up trace validation for the CCF consensus implementation.