solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
13.22k stars 4.31k forks source link

Feature Proposal: Revert VoteStateUpdate and use a different method for fixing issue 20014 #27473

Closed bji closed 2 years ago

bji commented 2 years ago

Problem

VoteStateUpdate is a costly and complex solution to issue 20014 when simpler and more performant solutions are available.

Proposed Solution

A validator still must observe proper lockouts when switching from the switched-from fork to the switched-to fork. It does this by retaining its lockouts while "still on that fork, waiting for lockouts to expire", and only updating the local vote_state with the saved lockouts as it is casting its first vote on a slot on the new fork. The validator has already ensured that it is safe to vote on the new fork before my proposed change has any effect on its vote_state's lockouts, so it can't be any less safe in terms of avoiding lockout violations, than the existing implementation.

bji commented 2 years ago

There are only two significant complexities of implementation of my proposal:

  1. It must be possible for the tower to be informed that a vote on a given slot is a switch from one fork to another, and to be told or compute the common ancestor (or, inversely, the earliest voted-on slot that is on the dead fork).

  2. It must be possible to save the history of slot-lockouts along with the tower in all places where the tower is persistently stored, and to load it back in when the tower is loaded. This would be easily achievable if the tower didn't have frozen ABI, which I assumes means that it cannot have serialized fields added to it. I would probably propose that the slot-lockouts history simply be appended to the existing serialized form of the tower in persistent storage, but I don't know if Solana Labs would like that idea.

Otherwise, the implementation is really trivial:

a. Create a simple data structure that stores a copy of the VecDeque that are the lockouts right before a slot was voted on by the local tower. b. On switching forks, remove all saved lockouts for all slots that were on the dead fork, and then use the lockouts from the highest remaining stored slot to replace the lockouts in the tower's vote_state c. When a slot pops out of the back of the tower, remove any saved lockouts associated with that slot from the saved lockouts

bji commented 2 years ago

Because the root is documented as being arbitrary within the local tower, I assume it's not necessary to save and restore the root_slot also in the tower's vote_state when switching forks.

bji commented 2 years ago

The rough order of maximum additional cost to store the slot-lockouts history is:

31 slots x 31 slots, which is 961 additional Lockout data structures stored. This will make the tower storage alot larger, something to consider. It could be serialized in a compressed form, which would help the size but depending on the cost of I/O versus compression, may not be the right trade-off.

AshwinSekar commented 2 years ago

I think that testnet may be experiencing performance problems just due to the increased load of handling VoteStateUpdate instead of Vote.

Is there any evidence to support this claim? The only findings I have are that block sizes have been increased which https://github.com/solana-labs/solana/pull/26092 should solve.

bji commented 2 years ago

I think that testnet may be experiencing performance problems just due to the increased load of handling VoteStateUpdate instead of Vote.

Is there any evidence to support this claim? The only findings I have are that block sizes have been increased which #26092 should solve.

No I have none really, just an assumption given that the processing of VoteStateUpdate looks way more complex than processing of Vote and that these transactions are the vast majority of transactions on testnet. Block sizes increasing due to the larger VoteStateUpdate transactions may also be contributory.

bji commented 2 years ago

Also -- can the lockouts from the tower just be taken from the actual Vote account for the validator at the bank of the common ancestor of the slot it's switching from and the slot it's switching to? Then nothing extra would need to be tracked at all.

AshwinSekar commented 2 years ago

This is accomplished most readily by having the tower save a copy of its lockouts right before each slot it votes on. Then when it switches forks, it finds the most recent slot voted on that is a common ancestor between the forks, and resets its vote_state's lockouts to the lockouts that existed immediately after the vote on that slot. Now it is in sync with the rest of the cluster, and applying the new slot to the tower for the new vote on the new fork does exactly what the rest of the cluster thinks it does.

There are many reasons why this doesn't work

Also -- can the lockouts from the tower just be taken from the actual Vote account for the validator at the bank of the common ancestor of the slot it's switching from and the slot it's switching to? Then nothing extra would need to be tracked at all.

This is something that carl and I discussed in the past - it is incredibly difficult. The issue is it is impossible to trust the validity of this vote state because you are disregarding any votes that were in transit waiting to land. Similarly the definition of common ancestor that you voted on becomes muddled here - what if that vote never landed? What if it landed out of order? Do you then use the block in which it landed? What if that block is on a different fork?

Let me know if this makes sense or you have any questions about the scenarios, happy to draw them out in depth if necessary.

AshwinSekar commented 2 years ago

If you're curious here's the initial discussion of the solution: discord://discord.com/channels/428295358100013066/489504501749907468/887686875777749032 Unfortunately I think the fire channel from last september has been axed so don't have any logs from there.

bji commented 2 years ago

If you're curious here's the initial discussion of the solution: discord://discord.com/channels/428295358100013066/489504501749907468/887686875777749032 Unfortunately I think the fire channel from last september has been axed so don't have any logs from there.

Thanks. Corrected link is:

https://discord.com/channels/428295358100013066/489504501749907468/887686875777749032

As far as I can tell, the discussion went straight from "there is a problem" to "what about syncing the cluster's tower state from the local tower state". I didn't see anyone suggest the opposite, which is what I am suggesting, and which I think must solve the same issue and in a much more direct way.

Consider a validator V which went off and voted on a fork. The partition included only three validators, two leaders (A and B) and the validator in question (V), which was never a leader.

Now consider that in this situation V could end up with local lockouts that don't match the cluster because of the example that started issue 20014 (i.e. local votes on 1 - 2 - 3, 3 is on dead fork, V's local lockouts are wrong with respect to the live fork).

This is the crux of the problem.

However, consider that the partition meant that no one on the network saw V's vote on 3 other than A, B, and V.

In order for A and B not to slash V, V must wait until its lockouts on 3 expire before it can switch to 6. So it does.

But once it waits long enough, no one else in the cluster even knows that the 3 vote existed.

Therefore, the effect on the cluster is the exact same as if V never voted on 3.

Therefore, V can act as if it never voted on 3.

Therefore, V can reset its tower state to exactly what it was after the vote on 2, because that is what it means to have never voted on 3.

The only thing V has to do is wait until lockouts expire on 3 so as not to be slashed by A or B. Then it can reset to lockouts state immediately before 3 (which is the same as immediately after 2).

The cluster CANNOT see any difference between the situation where V went off and voted on a dead fork versus one in which it didn't. Therefore V cannot cause any problems by behaving like that's what happened, all the way up to resetting its tower state like I mentioned.

AshwinSekar commented 2 years ago

Therefore, V can act as if it never voted on 3.

By voting on 3 you increment the lockout on 1 and 2, if there was say some other fork coming off of 0 you'd now have to wait longer in order to switch to it.

bji commented 2 years ago

Therefore, V can act as if it never voted on 3.

By voting on 3 you increment the lockout on 1 and 2, if there was say some other fork coming off of 0 you'd now have to wait longer in order to switch to it.

Yeah but aren't you understanding my proposal? You un-double the lockouts on 1 and 2 when switching forks away from the slot that caused you to double 1 + 2. That's what it means to "act as if it never voted on 3".

bji commented 2 years ago

From the block chain's perspective V never voted on 3 and never doubled 1 + 2. Therefore, when V leaves that fork, it returns to the state equal to "never voted on 3 and never doubled 1 + 2".

AshwinSekar commented 2 years ago

Anyone who viewed V's vote on 3 could then slash V for this switch from 0.

... unless you're suggesting that if you leave a fork all votes on that fork are invalid, in which case lockouts are meaningless as all switches would be valid. V "never voted" on 3 so it should be able to vote on all other forks as well.

bji commented 2 years ago

Anyone who viewed V's vote on 3 could then slash V for this switch from 0.

... unless you're suggesting that if you leave a fork all votes on that fork are invalid, in which case lockouts are meaningless as all switches would be valid. V "never voted" on 3 so it should be able to vote on all other forks as well.

What do you mean by "lockouts are meaningless"? They are not meaningless. They prevent V from switching forks too soon, and votes showing that V did so can be used to slash V. To be clear, the evidence would not all be on-chain. Some of it would have to come from validators who observed V's votes on the dead fork, and can prove what happened, by providing that evidence in the form of vote transactions submitted by the offending validator. Some of that evidence will not be a direct lineage of the surviving fork because some of those transactions were for slots that don't exist on the surviving fork. But the evidence can still be gathered and presented by validators who saw it happen.

However, the lockout doublings that were done by V on the dead fork because of its vote on 3, are meaningless with respect to the switched-to fork, since the 3 vote didn't happen from the perspective of the block chain.

AshwinSekar commented 2 years ago

When you switch you provide a switching proof for validators that have observed both forks to verify and validate. In this case the lockouts on the blockchain are incorrect, as they were never doubled so all validators would then have to maintain a separate vote_state to track your true lockouts.

bji commented 2 years ago

When you switch you provide a switching proof for validators that have observed both forks to verify and validate. In this case the lockouts on the blockchain are incorrect, as they were never doubled so all validators would then have to maintain a separate vote_state to track your true lockouts.

They already do maintain a separate vote_state to track your true lockouts. It's your vote account.

Also the lockouts on the block chain are correct, by definition. It's the local validator who is incorrect because it retained effects from votes on dead forks when switching to other forks.

AshwinSekar commented 2 years ago

so how do you suppose they're going to keep track of your true lockouts without a full voting history? You're suggesting pushing the problem of keeping track of accurate lockouts into another structure.

that's incorrect, the lockouts on the block chain are to encapsulate the voting behavior of the validator as closely as possible.

bji commented 2 years ago

so how do you suppose they're going to keep track of your true lockouts without a full voting history? You're suggesting pushing the problem of keeping track of accurate lockouts into another structure.

that's incorrect, the lockouts on the block chain are to encapsulate the voting behavior of the validator as closely as possible.

But they already do.

pub struct VoteState {
// ...
 pub votes: VecDeque<Lockout>,
 // ...
}

VoteState is maintained by each validator, for every validator (including itself).

The local validator additionally has a degenerate version of VoteState stored in its Tower data structure to track its lockouts including votes that it cast but it hasn't seen land on chain yet.

AshwinSekar commented 2 years ago

No they don't, VoteState is stored on chain and computed from the bank during replay

bji commented 2 years ago

No they don't, VoteState is stored on chain and computed from the bank during replay

You're just talking about the mechanism whereby validators track vote state of other validators. That doesn't change the fact that validators track/maintain the vote state of each validator.

Maybe we're just using different terminology and that is confusing things? I don't know.

In every Bank, the VoteState for every validator is stored. It includes the Lockouts that for that validator. These Lockouts are computed as votes are replayed.

Therefore, every validator, by virtue of having Banks with the exact same contents of every other validator, has the same Lockouts state at every slot for every validator. And that is the canonical, exact, and correct lockouts for that validator at that slot.

AshwinSekar commented 2 years ago
        |- 3 - 4 - 5 - 6
        |          
0 - 1 - 2
|       |- 12- 13
|- 15

Scenario, validator V votes on 0,1,2,3,4,5 then decides to switch to 12, and finally to 15. Validator V should be slashed because it violated lockouts when switching from 11 to 15 (1's lockout has not expired).

With your proposed solution: The local tower for V after voting on 2 is

Slot | Lockout
--------------
0    | 8
1    | 4
2    | 2

Now assume that for the votes 0 - 5 they all landed in 6

The on chain vote state for V in bank 6

Slot | Lockout       
--------------
0    | 64
1    | 32
2    | 16
3    | 8
4    | 4
5    | 2

The vote state for V on bank 13 where vote for 12 has landed

Slot | Lockout
--------------
12   | 2

So with this information how can i validate that slashing has occurred? The switching proof for 6 -> 12 is correct. Using the proposed vote states the switching proof for 12 to 15 is also correct.

You might say, let us keep a chain of pointers to forks that V has switched to and validate lockouts from all banks at the tip of these forks. Then the vote state on bank 6 would show us slashing. Suspending disbelief for a moment that we don't prune dead forks and have all this information readily available in memory, there are still holes here to fill.

We could easily change this example such that the votes for 0-5 landed not in 6 but some descendant, and now should we use the vote_state at the time of switch, or the vote_state of the current tip of the fork when evaluating? Use the max? Sure what if some alternative fork was popped off - how do we merge these vote_states?

I'm saying you simply don't have enough information to enforce lockout violations if you don't accurately track lockouts in on chain vote_state. You can propose other mechanisms but the crux of it is that you must have enough information to know V's tower pre and post switch in order to slash. If you have this information that means you are keeping an accurate tower for all other validators and that is not possible without VoteStateUpdate or at least Vote.

bji commented 2 years ago

I'm sorry but I don't understand:

"Validator V should be slashed because it violated lockouts when switching from 11 to 15 (1's lockout has not expired)."

It didn't switch from 11 to 15, it switched from 12 to 15, it never voted on 11. 11 was never in its lockouts at all.

I don't see a lockout violation in your example.

bji commented 2 years ago

OK wait a second I think I got it. The lockout violation is because of the votes on 0, 1, 2, 3, 4, 5, -- the validator is locked on a fork that includes slot 2 until at least slot 18, and includes slot 1 until at least slot 33. So voting 0 - 15 violates both those.

AshwinSekar commented 2 years ago

sorry s/11/12 yeah voting on 15 violates slot 2 and 1 from the initial fork

bji commented 2 years ago

So the data needed to detect this is there. The VoteState for 6 for V will show that it is locked out on 1 and 2 at the time that it casts the vote for 15.

AshwinSekar commented 2 years ago

Yeah but how do you get that vote state? With VoteStateUpdate it's all self contained

bji commented 2 years ago

Yeah but how do you get that vote state? With VoteStateUpdate it's all self contained

Yeah but how do you verify that the self contained data is correct and not missing stuff?

If I have a validator that never even saw V's votes on 3 - 5 (because it never saw blocks 3 - 5), how do I know that it's lying if I next see a VoteStateUpdate from it that doesn't include the effects on lockouts of those votes?

AshwinSekar commented 2 years ago

If I never saw it's votes on 3 - 5 then I can't verify a switching proof from 5 -> 12. If instead it submits a switching proof from 3 -> 12, when the votes from 3 - 5 land those votes violate lockout (no valid switching proof from 12 -> 3 and will be slashed.

bji commented 2 years ago

But the votes on 3 - 5 never land, it's a dead fork. I may never see them.

What's this switching proof from 5 -> 12? Why would V include that if it's trying to pretend that it never voted on 3 - 5?

bji commented 2 years ago

It is going to pretend that it never voted on 3 - 5. So it's going to submit its vote for 5, then later submit its vote for 12, then later still submit whatever transactions it wants to that look like it never voted on 3 - 5 or 12.

AshwinSekar commented 2 years ago

If I vote on 3 - 5 at some point they will make it to a leader through tpu or gossip. If you're saying that the votes never reach anyone then it's the same as if it never voted

AshwinSekar commented 2 years ago

Sure even if it switches 2 -> 12 after having voted for 5 at some point 5 will make it to someone and be slashable

bji commented 2 years ago

So if I receive bogus VoteStateUpdates and believe that V's voting sequence actually went 0 - 1 - 2 - 12 - 15, then later when I see the original VoteStateUpdate transactions for slots 3 through 5, I will not just reject the slots 3 through 5 VoteStateUpdate because it's clearly an old tx for old slots, but will recognize them as one part of a slashable event?

AshwinSekar commented 2 years ago

Yeah because the vote for 5 will have the lockout for 1 at 32 which isn't possible with your current vote state update of 15

AshwinSekar commented 2 years ago

So doesn't matter if it happened before or after, it's still a slashable vote not an old vote. If it's an old vote than the lockouts should be consistent.

bji commented 2 years ago

OK so I think I'm finally getting it and I really appreciate your patience. I guess that slash detection using my methodology requires keeping lots of fork history around (I had assumed it was already always kept around anyway, but I guess not). But VoteStateUpdate allows any two vote events to be detected as slashable regardless of the order they arrive and without requiring that the validator keep the fork data around.

This is basically what you were alluding to in your long comment, if I'm not mistaken.

AshwinSekar commented 2 years ago

Pretty much, think you'd either have to keep all forks (and i'm not convinced that is actually enough) or do something similar to what mvines is doing and keep track of local towers by parsing vote transactions.

If you're doing the second method then you'd still have to deal with out of order votes which i'm not sure your proposal handles.

As for processing concerns I think you might be overestimating how long processing VoteStateUpdate in relation to Vote is. VoteStateUpdate also allows us to do some optimizations in gossip and banking_stage during times of high spam by dropping extraneous votes from validators (we only care about the latest since votes are no longer incremental) which would also be a performance win.

bji commented 2 years ago

One point: VoteStateUpdate processing requires access to the VoteState of the vote account at the slot that the VoteStateUpdate tx is landing. check_update_vote_state_slots_are_valid() takes that vote state as an argument.

Therefore, your assertion that a VoteStateUpdate received later can detect a slashable event for a VoteStateUpdate received earlier, must also require that the fork history of the fork on which the VoteStateUpdate is landing, be available.

AshwinSekar commented 2 years ago

Slashing and switching proofs are yet to be implemented I think there would be a separate pipeline to verify that, you don't need the dead fork to verify that type of lockout violation.

bji commented 2 years ago

"If you're doing the second method then you'd still have to deal with out of order votes which i'm not sure your proposal handles."

My proposal was only about keeping the local lockouts in sync with the cluster for every fork, not trying to detect lockouts. I thought that the original problem, as alluded to causing a crash on last September, had nothing to do with slash detection, but with validators having incorrectly doubled lockouts in their tower which then prevented them from switching forks when they should be able to, then leading to a mass failure to switch forks when enough validators got into this state.

If the goal is detecting lockouts then I have to think harder about how to do that without VoteStateUpdate, if possible.

bji commented 2 years ago

Slashing and switching proofs are yet to be implemented I think there would be a separate pipeline to verify that, you don't need the dead fork to verify that type of lockout violation.

Yeah but you need that just to verify the validity of the VoteStateUpdate tx itself before you can even use it for anything. check_update_vote_state_slots_are_valid() requires it.

AshwinSekar commented 2 years ago

I might be incompetent in a lot of things but I think I can code a pathway using Option 🙃

Yeah I was saying that with out of order votes your solution doesn't keep local lockouts in sync with the cluster

Then when it switches forks, it finds the most recent slot voted on that is a common ancestor between the forks, and resets its vote_state's lockouts to the lockouts that existed immediately after the vote on that slot. Now it is in sync with the rest of the cluster, and applying the new slot to the tower for the new vote on the new fork does exactly what the rest of the cluster thinks it does.

If you had out of order votes this statement isn't true.

bji commented 2 years ago

I might be incompetent in a lot of things but I think I can code a pathway using Option upside_down_face

Yeah I was saying that with out of order votes your solution doesn't keep local lockouts in sync with the cluster

Then when it switches forks, it finds the most recent slot voted on that is a common ancestor between the forks, and resets its vote_state's lockouts to the lockouts that existed immediately after the vote on that slot. Now it is in sync with the rest of the cluster, and applying the new slot to the tower for the new vote on the new fork does exactly what the rest of the cluster thinks it does.

If you had out of order votes this statement isn't true.

I think the confusion comes down to this:

So when you say my proposal doesn't have correct lockouts when votes are received out of order, it's with respect to your definition of lockouts. With my definition of lockouts, vote order doesn't matter, resetting to lockouts state to the common ancestor when switching forks will keep my validator's lockouts in sync with on-chain VoteState (by definition). But again, it will require exterior mechanisms for detecting slashing.

bji commented 2 years ago

Now the crux of the issue is, which is more efficient:

or

I really feel like detecting slashing should be done off-validator. That really expensive processing doesn't need to be done by validators, who can readily just accept all votes and assume they are valid (like they are doing now). The threat of there being some watchdog group running slash detection, along with an actual "slash" transaction that accepts slash proofs and slashes if they are verified, would be enough to implement slashing, and it would take alot of work (and complexity) out of validators.

AshwinSekar commented 2 years ago
        |- 3
0 - 1 - 2 
        |- 5 - 6

You vote 0, 1, 2 the tower you save is

Slot | Lockout
0    | 8
1    | 4
2    | 2

You vote on 3 and then you later you switch to 5 so you restore the above tower and then apply your vote on top. Your local tower now looks like this:

Slot | Lockout
0    | 8
1    | 4
5    | 2

Turns out when your votes landed in the 5 fork the vote for 2 came before the vote for 1 so your on chain state actually looks like this and now you have a discrepancy.

Slot | Lockout
5    | 2
AshwinSekar commented 2 years ago

Yeah i'm open to the idea of slashing being done off validator. VoteStateUpdate ensures that your local tower is in sync with on chain vote state and minimizes the risk of you accidentally submitting a slashable vote due to consensus issues.

Slashing aside, with the out of order votes it's possible to have a huge divergence between your local tower and on chain vote state which means you can't vote until you expire enough of your local tower (what happened last september).

bji commented 2 years ago
        |- 3
0 - 1 - 2 
        |- 5 - 6

You vote 0, 1, 2 the tower you save is

Slot | Lockout
0    | 8
1    | 4
2    | 2

You vote on 3 and then you later you switch to 5 so you restore the above tower and then apply your vote on top. Your local tower now looks like this:

Slot | Lockout
0    | 8
1    | 4
5    | 2

Turns out when your votes landed in the 5 fork the vote for 2 came before the vote for 1 so your on chain state actually looks like this and now you have a discrepancy.

Slot | Lockout
5    | 2

You can't switch from 3 to 5. When you vote for 3, you double 2's lockouts, so you now have:

Slot | Lockout
0    | 16
1    | 8
2    | 4
3    | 2

That means that you can't vote on a non-decendent of 3 until after slot 5.

bji commented 2 years ago

So you vote on 6 instead. You reset to:

Slot | Lockout
0    | 8
1    | 4
2    | 2

Then apply 6:

Slot | Lockout
0    | 8
6    | 2
AshwinSekar commented 2 years ago

Sure my b make it slot 6 and make the initial chain pre fork longer, still same problem.

AshwinSekar commented 2 years ago

Now the crux of the issue is, which is more efficient:

  • VoteStateUpdate verifying lots of state on every vote (which includes the work of check_update_vote_state_slots_are_valid(), a necessary prerequisite to ensure that invalid VoteStateUpdate transactions are rejected)
  • But VoteStateUpdate also makes slash detection super trivial

or

  • Vote being a super simple and cheap transaction, but
  • Detecting slashable votes requiring keeping history of all slots that have not achieved finality, and also handling out-of-order votes with regards to detecting lockout violations

I really feel like detecting slashing should be done off-validator. That really expensive processing doesn't need to be done by validators, who can readily just accept all votes and assume they are valid (like they are doing now). The threat of there being some watchdog group running slash detection, along with an actual "slash" transaction that accepts slash proofs and slashes if they are verified, would be enough to implement slashing, and it would take alot of work (and complexity) out of validators.

I think priority # 1 is to make a vote program that can handle out of order votes. Then detecting slashing / lockouts etc. If you have a suggestion for something like that I'm all ears.