Closed steviez closed 4 months ago
@carllin - Here are some collected thoughts from that item I spitballed to you earlier today
Following up on this, the PR authors determined that a consensus breaking change was present in the PR. The issue has since been addressed, and I elaborate a little more and tie into some logs here. The TLDR of that comment is:
X
takes effect in slot X
, and will not alter the state of any slots <= X
.X
in slots prior to X
The change in bank hashes had a few results on the two nodes that revealed an inconsistency in the validator. Again for clarity, the PR made it such that the bank hashes changed for slots [1, 39]
.
Node A:
ReplayStage::compute_bank_stats()
). The adopted state goes up to slot 39, and Node A begins voting on slot 41 in pretty normal fashion.[1, 39]
when creating the snapshot. As a result, Node A did NOT have any landed vote state to rebuild its' local state from. Regardless, it starts voting with slot 41 like a brand new voting node would, nothing really noteworthy about this either.Node B:
Tower::make_check_switch_threshold_decision()
exits early for Node B due to the below item for slots 41-46:
https://github.com/solana-labs/solana/blob/de024bf98d3670867608dddc7022298ac1c5b08f/core/src/consensus.rs#L855-L857
Node B refrains from voting until it is able to switch onto the new correct fork. Once Node B is able to switch over to the correct fork, the cluster begins making roots and the test succeeds.[1, 39]
when re-processing these blocks. The dropped votes (as reflected in the Bank) are at odds with its' local tower state. Eventually, block 47 is recreated with a parent of 46; this means that 46 and 47 now show up in ancestors and descendants maps created from BankForks
that are passed to Tower
. The chain for 47 is 40 ==> 45 ==> 46 ==> 47; 41-44 were created, but Node B built 45 off of 40. At this point in time, slot 43 is the heaviest bank that is passed to Tower::check_switch_threshold()
through ReplayStage::select_vote_and_reset_forks()
:
All this being stated, the flaw itself is that the tower from pre-restart is not invalidated. The extra digging was to check if there was a logic bug in Tower
's impl, which does not appear to be the case. So, it seems that ensuring the tower gets properly invalidated when a hard fork is passed to the validator is the course of action.
The panic: assert!(!last_vote_ancestors.contains(candidate_slot));
is essentially complaining that given some last voted slot S
and some candidate slot C
, that ancestors(S).contains(C)
.
However, for this to to be true, it must also be true that descendants(C).contains(S)
, which is what this check https://github.com/AshwinSekar/solana/blob/24a3af8d81bd229ad7e8a7f82ccdf6f24256a206/core/src/consensus.rs#L887-L895 should have caught, because:
computed
flag set, which should be true for all frozen banks after compute_bank_stats
in replay_stage
runs https://github.com/AshwinSekar/solana/blob/24a3af8d81bd229ad7e8a7f82ccdf6f24256a206/core/src/replay_stage.rs#L3094 (Should run on the voted bank right after startup).So i would after reproducing the panic:
C
in https://github.com/AshwinSekar/solana/blob/24a3af8d81bd229ad7e8a7f82ccdf6f24256a206/core/src/consensus.rs#L887-L895, and check that S
is in thereS
is in the list of descendants, print out whether the computed
flag is setHowever, for this to to be true, it must also be true that descendants(C).contains(S) ... should have caught, because:
Not sure if you read through my whole comment, but the third bullet With the PR, Node B also drops votes for slots ...
explains what happened (after doing some of the same steps that you suggested like dumping descendants
out). But in any case, addressing your specific items:
last_voted_slot
is 47
due to bug I called out (tower that extends beyond restart slot not getting invalidated)Tower::make_check_switch_threshold_decision()
gets called with switch_slot
= 43
46
is an element of descendants
46
does not yet have stats.computed = true
(recall that 47 is not frozen, 46 JUST got frozen)true
so we do NOT hit the continue
last_vote_ancestors
contains 46
as I mentioned that 46 is 47's parentNot sure if you read through my whole comment, but the third bullet With the PR, Node B also drops votes for slots ... >explains what happened (after doing some of the same steps that you suggested like dumping descendants out).
Ah I see, sorry missed the key point there. The problem is 47 is the last vote in tower, but it was:
Hmm yeah the proper thing to do is probably to set the root to the hard fork slot in tower and clear everything in tower above that slot.
This one is still a problem AFAIK; I think there is another layer of using --hard-fork
with validator being broken. Will type up a new issue in agave repo and cross-link
Problem
The author(s) of https://github.com/solana-labs/solana/pull/31957 reported hitting a failure on the below test that at least at the surface doesn't seem to be directly caused by the PR. https://github.com/solana-labs/solana/blob/dcd66534dda718160c80752711e6c3a8c6686992/local-cluster/tests/local_cluster.rs#L2217
In general, the test looks to be somewhat flaky as the author mentioned seeing their branch fail on Linux but not on their Mac. I was able to reproduce the failure with very high reliability on a Linux machine, and was able to determine that it is failing due to hitting his assertion: https://github.com/solana-labs/solana/blob/dcd66534dda718160c80752711e6c3a8c6686992/core/src/consensus.rs#L881-L884
This test starts a local-cluster, makes several roots, stops the cluster, chooses a restart slot beyond the latest fork, and then restarts the cluster through wait-for-supermajority. In this test, the two nodes are restarted in different ways:
new_hard_forks
field ofValidatorConfig
.These two approaches are supposed to be equivalent; however, I discovered there is a subtle difference:
BankForks
look like this:BankForks
looks like:A key difference to note here is that these nodes have a different perspective of root. Technically, the cluster has not rooted slot 40, but by virtue of it being picked as the restart slot, it will become rooted. At this point, Node A considers 40 the latest root whereas Node B considers the latest root to be 16. This difference is important in this function https://github.com/solana-labs/solana/blob/077e29aa1eac997707a98a13e3cfb6000ba00ede/core/src/validator.rs#L1443-L1452 which then determines how
post_process_restored_tower()
functions: https://github.com/solana-labs/solana/blob/077e29aa1eac997707a98a13e3cfb6000ba00ede/core/src/validator.rs#L1469-L1496From the logs, I determined that Node A was entering the above
if
statement whereas Node B was not; this follows from the first snippet where we'll only return aSome(...)
ifwait_slot_for_supermajority == root_slot
.As further proof, here is the comparison of towers between the two nodes; noting that Node B contains votes past the hard fork.
With all of that being said, I also see that Node B does not enter that case when the test passes with the tip of master. I believe there may be a race condition with what votes land / state is updated when the local cluster is stopped to pick a restart slot.
Here are some logs from several runs:
Proposed Solution
I'm still digging, there is the suspicion that some "cleanup" should occur on Node B but isn't, but that theory is currently contradicted by the logs from when the test succeeds with tip of master. So, a few action items:
Continue looking intocore/src/consensus.rs
and better understand the assertion that was triggeredwait_for_supermajority_slot
a root inBankForks
if it is not already; this would make this slot considered a root if someone doesn't create a new snapshot with hard-fork. Whether we do this is probably dependent on results of above bullet.