solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
12.99k stars 4.17k forks source link

get_block_time() can return results for unrooted blocks #33858

Open steviez opened 11 months ago

steviez commented 11 months ago

Problem

As observed in Discord by Trent, getBlockTime can return results for non-roots: https://discord.com/channels/428295358100013066/689412830075551748/1164770822532104222

The block time for all frozen banks (at this point, we don't know if a given Bank will be rooted or not) is sent to CacheBlockMetaService from ReplayStage: https://github.com/solana-labs/solana/blob/8260ffc1efae02dd9a1250d21786e6b005524de9/core/src/replay_stage.rs#L2908

CacheBlockMetaService then pops the bank from the crossbeam channel https://github.com/solana-labs/solana/blob/8260ffc1efae02dd9a1250d21786e6b005524de9/core/src/cache_block_meta_service.rs#L44 and inserts the time into Blockstore: https://github.com/solana-labs/solana/blob/8260ffc1efae02dd9a1250d21786e6b005524de9/core/src/cache_block_meta_service.rs#L61-L63

On the RPC read side, data is fetched from the Blockstore if the slot is <= the latest root: https://github.com/solana-labs/solana/blob/8260ffc1efae02dd9a1250d21786e6b005524de9/rpc/src/rpc.rs#L1316-L1339

And here is the implementation for Blockstore::get_block_time(). check_lowest_cleanup_slot() just ensures that the slot hasn't already been purged from the blockstore and besides that, this is a basic get: https://github.com/solana-labs/solana/blob/a3b0348649db5788e59237b5778405b2554704b2/ledger/src/blockstore.rs#L1965-L1969

Based on function name alone, we might expect check_blockstore_root() to error for non-rooted slots. However, this is not the case: https://github.com/solana-labs/solana/blob/8260ffc1efae02dd9a1250d21786e6b005524de9/rpc/src/rpc.rs#L1008-L1028

Note that this function only performs checks if the result of blockstore.get_block_time() was an Err; if the function returns an Ok(...), then check_blockstore_root() is a noop.

Assuming the slot was found in blockstore, the bigtable query is skipped and the only remaining check within JsonRpcRequestProcessor::get_block_time() is here: https://github.com/solana-labs/solana/blob/8260ffc1efae02dd9a1250d21786e6b005524de9/rpc/src/rpc.rs#L1030-L1051

Again, if the result from our original Blockstore::get_block_time() call was an Ok(...), then we get through this function without error.

So, with everything above stated, it is possible for a non-rooted slot to be returned by the getBlockTime RPC method. Note that getBlockTime differs from getBlock in that getBlock will only grab rooted blocks from the start: https://github.com/solana-labs/solana/blob/8260ffc1efae02dd9a1250d21786e6b005524de9/rpc/src/rpc.rs#L1103

Proposed Solution

getBlockTime does not include a commitment level parameter so it is arguably correct to return non-rooted blocks. However, given that non-rooted blocks are not loaded to bigtable, this creates an inconsistency in that a result may be available for a period of time (while that slot is still in local Blockstore) but then become unavailable.

So, I think the first thing to be hashed out is if getBlockTime should incorporate a commitment level like getBlock.

steviez commented 11 months ago

@t-nelson & @CriesofCarrots - I think I have figured things out such that we can get some closure on the recent discussion in Discord. Also, curious to hear your thoughts on if getBlockTime should be adjusted to incorporate a commitment level; I'm currently leaning that way but would like to reach agreement for doing the work

t-nelson commented 11 months ago

so it is arguably correct to return non-rooted blocks

this only really makes sense up 'til the point one of a block's siblings/cousins is rooted. afterwards, the block can never be canonical and should not be returned

i'm not opposed to adding commitment level here. ideally we just dereplicode getBlockTime maximally with getBlock

steviez commented 11 months ago

so it is arguably correct to return non-rooted blocks

this only really makes sense up 'til the point one of a block's siblings/cousins is rooted. afterwards, the block can never be canonical and should not be returned

Cool, I was stating that argument for the sake of completeness; I'm on board for not returning blocks that would be considered processed and that may not reach confirmed or finalized (note that getBlock does not accept processed so another win for consistency)

i'm not opposed to adding commitment level here. ideally we just dereplicode getBlockTime maximally with getBlock

🤝

CriesofCarrots commented 11 months ago

Thanks for analyzing this one, @steviez ! I don't know why I thought Blockstore would only return block times for rooted slots, when I haven't told it to do so 😅

I'm not opposed to adding commitment level here either. However, independent of that, I think we should make getBlockTime parallel getBlock even more closely, by replacing the call to Blockstore::get_block_time() with a call to a new Blockstore::get_rooted_block_time() method that looks like this:

    pub fn get_rooted_block_time(&self, slot: Slot) -> Result<Option<UnixTimestamp>> {
        datapoint_info!("blockstore-rpc-api", ("method", "get_block_time", String));
        let _lock = self.check_lowest_cleanup_slot(slot)?;
        if self.is_root(slot) {
            return self.blocktime_cf.get(slot);
        }
        Err(BlockstoreError::SlotNotRooted)
    }

In fact, I'm just going to PR that real quick (edit: #33871). Since it makes the logic more similar, it can only help the dereplicoding effort on the rpc.rs side.

However, given that non-rooted blocks are not loaded to bigtable, this creates an inconsistency in that a result may be available for a period of time (while that slot is still in local Blockstore) but then become unavailable.

Once the Blockstore call is changed, this inconsistency will change but not disappear. A result may be available while a slot is newer than the highest_super_majority_root boundary, and then disappear if the slot was not rooted.

steviez commented 11 months ago

I'm not opposed to adding commitment level here either.

Great, sounds like we have agreement. Additionally, I think we all agree that consistency with getBlock is desired, so I'll adhere to the same rules as getBlock in regards to commitment:

  • the default is finalized
  • processed is not supported.

In fact, I'm just going to PR that real quick

Sweet, thank you! That'll slim down the PR for adding commitment a little

Once the Blockstore call is changed, this inconsistency will change but not disappear. A result may be available while a slot is newer than the highest_super_majority_root boundary, and then disappear if the slot was not rooted.

Good call out and agreed that this will be the case when #33871 lands. Given that I do not plan to support processed, this should NOT be the case after commitment level has been properly plumbed