carllin commented 2 years ago

Problem

Rewards calculations scales linearly with the number of stake accounts and can take multiple seconds or even longer as the number of stake accounts grows. Right now all of these computations happen on bank creation, which stalls progress

Proposal 1: Sharding the computation over multiple slots in an epoch

Similar to rent, this would involve computing in every slot rewards for a given range of pubkeys. We could take the write locks for the stake keys at the beginning of the slot, and run the computation overlapping the slot so that the work doesn't accrue a bunch of latency at the end of a slot

Downsides

Different accounts will see their rewards at different times in the epoch, which may not be the greatest user experience
Like rent, our index is not optimized for range queries, so finding for the stake accounts in a range may require scanning a large range of the sharded accounts index, which may stall other banking/replay operations touching the same range of sharded keys. This may not be a huge deal if we have a bunch of shards in our Accounts index.

Proposal 2: Long running background process that writes updates to a "future" slot

This proposal will require a lot more changes. The idea is to:

Prevent any modifications to stake accounts for the first N slots of an epoch, where N is based on the estimated time we think the rewards calculation will take.
Spin up a background computation that iterates over all the stake accounts and runs the rewards computation, writing the new stake accounts with rewards to a new AppendVec in AccountsDb called rewards
After the computation is done, also hash all of the stake accounts changes. Call this hash H
Any slot M that cross the boundary of slot N has to
- Wait for the background computation to finish
- We page in the AppendVec containing all the changes into a different storage map, different than the slot based storage map. The background computation will write into the account index for all stake accounts a special entry that signals they need to check this storage for updates if the slot they are looking up is > N, the boundary slot
- Incorporate the hash H of all these stake account changes into the bank hash for slot M

mvines commented 2 years ago

Different accounts will see their rewards at different times in the epoch, which may not be the greatest user experience

I feel this makes Proposal 1 a non-starter unfortunately. We can't have epoch rewards happening at different times for different stake/vote accounts, this'll likely break all downstream reward integrations as well as user expectations about when rewards are delivered.

ryoqun commented 2 years ago

Well, I pondered on this to some extent.

considering recent staking account growth and the nature of staking as a pivotal instrument for Solana's network. I'm inclined to my variant of Proposal 1, which avoids the different timings across staking accounts. However the following my idea delays the staking reward distribution by a epoch.

Proposal 1a.

Over N+1 epoch, firstly slowly compute the staking rewards for individual stake accounts. At the start of N+2 epoch, all staking rewards for N epoch is available (it will be reflected when touched by tx/rpc like rent; otherwise the distribution to untouched stake accounts are spread across the epoch, like rent again)

Pros

no perf related worries for inflation code again
- (at the cost of another hard req for ordered index..)
current inflation math won't be changed much (so less risk)
no more bank.stakes cache at the accountsdb write path (i never liked this... esp, pretty large mem usage these days. xD)
the pubkey prefix scanning code is pretty robust, thanks to the eager rent collection.
in the future, staking reward calc could be more complex (well, only vote_credits isn't ideal, i guess)

Cons

delayed inflation reward payout across board by 1 epoch
increased storage (3x) due to impl. (this will be hidden cost by not exposing this special casing to onchain rents for stake accounts)

Implementation details

Simply put, pipeline with shadow copies over epochs.

Step 0: (at epoch N): Create shadow copies of vote/stake accounts at pda(stake/vote account address, epoch, bank hash of last epoch's bank) at every writes of all kinds to the accounts (pda-ing requires special purpose index or ranged calculation but this is desirable against pubkey prefix distribution imbalance)
- I'm also thinking we also need this pda indirection for eager rent collection (= rent by pubkey prefix over an epoch)
- Also, I'm assuming these additional index don't pressure mem with the new disk based index mechanism?
Step 1: (at epoch N+1): calculate staking rewards using the last snapshot of accounts from Step 0 and write results to pda(stake account, epoch, "rewards") or something.
Step 2: (at epoch N+2): like rent, reflect the rewards into the original stake account when touched or by pubkey prefix
Step 3: (at epoch N+3): remove the temporary staking reward result account (again slowly)

Hope this uniform delay could be acceptable?

ryoqun commented 2 years ago

I wonder how's going with other pos blockchain's implementation, regarding staking reward calc./dist at their each epochs at the orders of million accounts.. I'll check it later if no one can shed light on this. :)

ryoqun commented 2 years ago

hmm, interesting

Polkadot: staking is limited, actually due to intensive computing cost like us:
- https://support.polkadot.network/support/solutions/articles/65000168057-how-do-i-stake-nominate-on-polkadot-
- https://polkadot.network/blog/polkadot-and-kusama-staking-changes/

ryoqun commented 2 years ago

more interesting, cardano are facing similar issue like us right now. let's get along with by overcoming the grow pains.... :)

https://github.com/input-output-hk/cardano-node/issues/2851

mvines commented 2 years ago

Of course the real solution is stake pools!

jstarry commented 2 years ago

I suggest that we do what most yield farming protocols do and use a redemption model which doesn't require writing to the accounts for each reward cycle. Balances are only updated when a user redeems their stake rewards for lamports. We could add a special runtime instruction for this which requires the user to pass all necessary historical stake data required for that delayed computation. All stake weighted parts of the protocol can continue making use of the stakes cache to determine the full delegated stake balance for each validator as if all stake accounts were redeemed.

jstarry commented 2 years ago

Sounds like lazy/manual redemption could work but there are a few big action items: 1) Determine what (if any) impact this has to validator operations and discuss with them 2) Propose how redemption will work for stake accounts since the program runtime doesn't support unbalanced instructions currently. At minimum I think we need a last_redeemed field for stake accounts for this to work.

ryoqun commented 2 years ago

late for the party.

I suggest that we do what most yield farming protocols do and use a redemption model which doesn't require writing to the accounts for each reward cycle.

first of all, this redemption idea sounds a nice inspiration from the defi. note that I'm still the defi newbie. my understanding of the defi redemption mechanism is still only high level one...

here's some random thoughts:

maybe we still need to run the heavy rewards calculation at epoch boundaries (now off-chain) for the core's various staked weighted functionality and redemption by impatient users)
- in that case, tx cost with stake::redempt should be handled differently (esp at the start of epoch like stalling if many?)
- how about composability/compatibility with (existing) staking pools?
rpc also needs to be updated? (also wallets and exchanges with staking support).
the lazy calc. will necessitate to retain (potentially large) staking calc data or latest (actual) balance of all stake accounts?
- maybe we charge extra cost for this?
slashing and total-/circulating supply calc also will need an update.

Propose how redemption will work for stake accounts since the program runtime doesn't support unbalanced instructions currently. At minimum I think we need a last_redeemed field for stake accounts for this to work.

Come to think of it, we actually had this redemption mechanism partially but removed it due to gaming concern (different SOL rewards depending on timing). SysvarRewards and the normal credits_observed should work without the painful stake account versioning maybe?

ryoqun commented 2 years ago

oh, another wild idea is start to pre-calculate next epoch's pending rewards distribution early and then just swap it (at the stake cache layer?) at the start of new epoch after fine-tuning last bits, regards the few last slots before finalization?

ryoqun commented 2 years ago

oh, another wild idea is start to pre-calculate next epoch's pending rewards distribution early and then just swap it (at the stake cache layer?) at the start of new epoch after fine-tuning last bits, regards the few last slots before finalization?

hhm, but this distinct 2 code path could be easily turned into a dos attack vector. ;)

brooksprumo commented 2 years ago

Can the stake rewards calculation be spread out over the whole epoch instead of occurring at the new epoch's first slot?

I think this might delay receiving the stake rewards by one epoch, but maybe that latency doesn't really matter assuming stake likely remains delegated for multiple epochs and doesn't need to move often?

joncinque commented 2 years ago

I'm very late to this party as well!

Propose how redemption will work for stake accounts since the program runtime doesn't support unbalanced instructions currently. At minimum I think we need a last_redeemed field for stake accounts for this to work.

How about going the lazy approach through a RewardsPool? But not the one that currently exists. All of the rewards go into some pool, either a pool per vote account or a general pool. A new redeem instruction on the stake program moves however much a stake account is entitled to from the pool and into the stake account.

SysvarRewards and the normal credits_observed should work without the painful stake account versioning maybe?

As long as it goes back far enough, we should be able to properly figure out the rewards, even multiple epochs later, correct? The credits_observed might actually be the saving grace here, since it can also tell us when we last received rewards.

how about composability/compatibility with (existing) staking pools?

Stake pools require some sort of "accounting" at the start of epochs anyway, so it would actually be really easy to tack on some additional redeem instruction for all of the stakes in the pool.

Are there any other programs that directly use stake accounts? Explorers and RPC providers could be updated to just do some math and show how much can be redeemed by the user.

carllin commented 2 years ago

@brooksprumo are your proposed solutions about rent? This issue was originally about the stakes rewards computation

brooksprumo commented 2 years ago

@brooksprumo are your proposed solutions about rent? This issue was originally about the stakes rewards computation

Oops, thanks! Moved over to the correct issue now: https://github.com/solana-labs/solana/issues/17178#issuecomment-1026229951 https://github.com/solana-labs/solana/issues/17178#issuecomment-1026230118

sakridge commented 2 years ago

@HaoranYi can you try to take a look?

HaoranYi commented 2 years ago

Sure. I will start looking at this.

HaoranYi commented 2 years ago

New to the party as well. Here is what I have learned.

Current stake reward processing

At the start of each epoch, the stake rewards are computed and credited to stake accounts through the following code path - which actually happens on the bank creation time (i.e. bank::new_from_parent) synchronously and block further bank processing.

stake_rewards dotuml

The problem is that, with the growing number of stake accounts, the reward computation time takes multiple seconds.

To address this issue, a few approaches have been proposed, which is summarized below.

Summary of stake_rewards proposals

sharding the reward computation and update over multiple slots
offload reward computation into a background threads and update later when rewards are available
precompute reward before epoch boundary
use a reward pool to store rewards, and provide new reward withdraw instruction to let the user withdraw reward later.

I have created a channel #stake_rwards on discord. Let's discuss which direction to approach this problem.

HaoranYi commented 2 years ago

joncinque The RewardsPool that you describe is some kind of on chain storage? Some thing like a map to store the available stakes reward for account? At the beginning of epoch, we update the on map with the new rewards. And later the stake account can use the "redeem" instructions to withdraw the rewards. Is that what you are proposing?

joncinque commented 2 years ago

The general concept is to store the reward lamports "somewhere", either in a new cluster-wide account or the vote account, and then be able to calculate on-chain during withdrawal how much a stake account is entitled to

HaoranYi commented 1 year ago

A related proposal for partitioned epoch reward can be found here. https://github.com/solana-labs/solana/pull/27455

solana-labs / solana

Proposals for long-running rewards computations #20384

Problem

Proposal 1: Sharding the computation over multiple slots in an epoch

Downsides

Proposal 2: Long running background process that writes updates to a "future" slot

Current stake reward processing

Summary of stake_rewards proposals