solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
13.34k stars 4.35k forks source link

use `VoteState::deserialize_into()` in place of bincode everywhere #35101

Open 2501babe opened 9 months ago

2501babe commented 9 months ago

Problem

in #34829 and #34972 we implemented VoteState::deserialize_into(), a custom parser intended to be suitable for usage in a bpf context. we now want to use the new parser everywhere because it is 7-20x faster than bincode depending on the input

HANA custom vs bincode
* we win: 10000, they win: 0
* our max 22.422µs, their max 471.89µs
* our avg 5.76µs, their avg 35.956µs
* avg speedup 30.195µs

the advantage remains even if we compare our VoteState parser to bincode parsing to VoteStateVersions without calling convert_to_current()

HANA custom vs bincode, wins: 10000 vs 0, max: 22.147µs vs 96.273µs, avg: 4.16µs vs 38.144µs

it also uses half as much memory for V1_14_11 because conversion to Current happens during parsing

Proposed Solution

change the impl of VoteState::deserialize() from bincode to our new parser, and use the new VoteState::deserialize() wherever we can

based on my research into the (various) different mechanisms to deserialize VoteState, this is the basic situation:

originally i believed that we needed a feature gate for the vote program changes, because of a couple patterns where VoteStateVersions was decoded and then only later converted to Current. but it appears now that nothing important happens in between. so a feature gate comes down to "we expect the parser to behave identically, and its already in tower and blockstore, so should go all the way" vs "minimize the ungated change surface for the sake of prudence." hoping for input here

the new parser always produces an identical parse to bincode for well-formed inputs. the potential risk is that in some cases it behaves differently for certain malformed inputs that bincode would accept, particularly:

these differences should not affect anything because presumably maliciously crafted accounts not owned by the vote program should be rejected by an owner check, and inserting a prior voter with a zero epoch doesnt appear to be possible because the code is well-encapsulated. but given how extensive this change is, im trying to be maximally cautious

update: the only difference is now the is_empty check. at this point im leaning toward no feature gate

as a side note, the parser falls back to bincode for V0_23_5, but ive confirmed there are no such accounts (at least delivered by the getVoteAccounts rpc method) on devnet, testnet, or mainnet-beta

there are some complications revolving around is_uninitialized() in VoteStateVersions because an uninitialized V0_23_5 account converted to current becomes "initialized", because is_uninitialized() (presumably erroneously) only checks if voters is empty, convert_to_current() put the default pubkey/zero epoch "voter" in the btreemap, and changing convert_to_current() to not do this breaks other things

my current plan for testing the parts of the change that cannot be feature-gated is to cherrypick the new parser into a 1.17 branch and run a validator built off that codebase for several epochs to confirm correct behavior is adhered to


final update: ive decided to forgo the feature gate because a) the parser will already be in so many codepaths without it that it only really provides false security, and b) i made some changes to its behavior such that it should now succeed or fail exactly in line with bincode

2501babe commented 9 months ago

cc @joncinque, also cc @sakridge because jon mentioned you might be interested in this