Refactoring analysis: Analyse the usage of block numbers

the-right-joyce commented 2 years ago

Research where block numbers are used (who needs to access block numbers and why)

Current status: https://hackmd.io/@_FY3-hvwQZ6cX_4n8zYUNA/rymYZLZlp

michalkucharczyk commented 1 year ago

`BlockId` refactoring status and next steps

Some of core traits are already cleaned from BlockId, however there are still some methods that are related to BlockId or block number. The summary of use cases is in the sections to follow.
Reworking header function led to some call sequences that are far from being perfect, example. Namely, in some modules there is a requirement to fetch header using block number, current PR works around this by calling header(hash(n)) combo.
In some cases expect_header(block_id) is called to hide the combo, (as expect_header was not reworked yet)
To solve this "problem" HeaderBackend trait could be extended with dedicated functions for fetching header by block number: -- header_for_number(NumberFor<Block>) -> Result<Option<Block::Header>> -- expect_header_for_number(NumberFor<Block>) -> Result<Block::Header> -- header(Block::Hash) -> Result<Option<Block::Header>> -- expect_header(Block::Hash) -> Result<Block::Header> New naming may also make finding all block-number usages and their refactorization easier. I would follow with this naming convention in needed for the rest of methods (e.g. block, block_status) Implementation of _for_number methods could make use of new service which will move number-hash mapping out of DB.

Where the `BlockId` is currently used:

Still in client API:

trait Chain common block validation for consensus -- block_status
trait BlockIdTo -- to_hash -- to_number
trait HeaderBackend: -- status -- expect_header -- block_hash_from_id -- block_number_from_id
trait BlockBackend -- block -- block_status
impl Client: Chain, BlockBackend, CallApiAt, HeaderBackend, BlockIdTo, BlockBuilderProvider -- code_at -- runtime_version_at -- block_status
trait ExecutionExtensions -- manager_and_extensions -- extensions

In `runtime_api()` subsystem:

Refactoring here is probably doable, requires some adjustments in runtime-related macros. Question if we want to get rid of BlockId parameter to runtime functions. This maybe inconvenient in some cases. Maybe we should introduce dual functions (following _for_number approach) at the top-level runtime API.

trait GetRuntimeVersionAt -- runtime_version
trait ApiExt -- has_api -- has_api_with -- api_version
trait CallApiAt -- runtime_version_at -- state_at
WasmSubstitute -- matches (operates on block number) -- get
trait CallExecutor -- prove_execution -- contextual_call -- call -- runtime_version
trait LocalCallExecutor -- check_override

Other usages

Benchmark::run (from - to) -- measure_block -- consumed_weight
impl PowBlockImport -- check_inherents
trait PowAlgorithm -- verify
DB Something that we want to get rid of: -- API: remove_from_db, read_header, read_db -- HeaderBackend::status -- internal implementation Backend::prune_block
InMem: -- Blockchain::id (blockid -> hash) -- HeaderBackend::status
AuraVerifier -- check_inherents (_runtimeapi called)
BabeVerifier -- check_inherents (_runtimeapi called) -- check_and_report_equivocation
Beefy -- expect_validator_set
BlockBuilderProvider -- new_block -- new_block_at
BasicAuthorship Proposer -- impl uses BlockId
BlockRequestHandler -- handle_request / get_block_response

Network tests utils

PeerClient -- has_state_at
Peer -- generate_blocks_at -- push_blocks_at -- push_blocks_at_without_informing_sync -- push_blocks_at_without_announcing -- generate_tx_blocks_at

utils

impl BlockNumberOrHash -- parse
pub fn check_block

Transaction pool:

As transaction verification is done via runtime calls, the API for submitting also uses BlockId.

trait ChainApi -- validate_transaction -- block_id_to_number -- block_id_to_hash
trait TransactionPool --submit_at --submit_one --submit_and_watch
impl Pool --submit_at --resubmit_at --submit_one --submit_and_watch --prune_known --prune --prune_tags --resolve_block_number --verify --verify_one
impl ValidatedPool --resubmit_pruned --fire_pruned --clear_stale
impl FullChainApi
impl BasicPool
impl TestApi fn validate_transaction( fn block_id_to_number( fn block_id_to_hash(
test::OffchainWorkers::submit_at

Where Block number is used

not completed yet

have_state_at(hash, number)

hash(n) usages


client/consensus/babe/src/lib.rs
1775 pub fn revert

client/db/src/lib.rs fn prune_blocks: 1753 fn prune_displaced_branches

client/finality-grandpa/src/warp_proof.rs|100 WarpSyncProof::generate

client/finality-grandpa/src/environment.rs 1249 pub(crate) fn finalize_block

client/rpc/src/chain/mod.rs|83 ChainBackend::block_hash

client/finality-grandpa/src/import.rs|366 GrandpaBlockImport::make_authorities_changes

client/service/src/client/client.rs|1036 Client::block_status

client/network/sync/src/lib.rs|804 ChainSync::on_block_data / AncestorSearch

primitives/blockchain/src/backend.rs|227 trait Backend::best_containing

primitives/blockchain/src/backend.rs|53 HeaderBackend::block_hash_from_id

kianenigma commented 1 year ago

What's the end goal here? to make everything block number? or hash?

bkchr commented 1 year ago

What's the end goal here? to make everything block number? or hash?

Make as much as possible use hash, as hashes are unique. We will not be able to do this for everything as RPC or CLI functionality will always need to pass numbers. However, the "end goal" is that block numbers may go out of the internal handling of the db.

bkchr commented 1 year ago

4. To solve this "problem" HeaderBackend trait could be extended with dedicated functions for fetching header by block number: -- header_for_number(NumberFor<Block>) -> Result<Option<Block::Header>> -- expect_header_for_number(NumberFor<Block>) -> Result<Block::Header> -- header(Block::Hash) -> Result<Option<Block::Header>> -- expect_header(Block::Hash) -> Result<Block::Header> New naming may also make finding all block-number usages and their refactorization easier. I would follow with this naming convention in needed for the rest of methods (e.g. block, block_status) Implementation of _for_number methods could make use of new service which will move number-hash mapping out of DB.

If this helps moving forward with this issue, it should be fine!

michalkucharczyk commented 1 year ago

BlockId refactoring status and next steps (2)

Where the block number is actually used

Some usages may still be missing, but the list shall better then in previous comment:

header_metadata()
- fn tree_route
- ...?
hash(n) usages
- babe: pub fn revert
- granpa: pub(crate) fn finalize_block
- GrandpaBlockImport::make_authorities_changes
- WarpSyncProof::generate
- Backend::prune_blocks
- ChainSync::on_block_data / AncestorSearch
- ForkBackend::expand_forks
- Benchmark::run(from - to): fn run
- conversion api: HeaderBackend::block_hash_from_id
- conversion api: ChainBackend::block_hash
have_state_at(hash, number)
- have_state_at definition: Backend::have_state_at
- Backend::revert
- fn Client::block_status
block_hash_from_id(BlockId)

Where the `BlockId` is currently used:

Still in client API:

Seems to be fine, just conversion functions.

trait BlockIdTo -- to_hash -- to_number
trait HeaderBackend: -- block_hash_from_id -- block_number_from_id -- hash(number)
trait ExecutionExtensions -- manager_and_extensions -- extensions
trait IndexedBody -- block_indexed_body(number)

Implementation

DB Something that we may want to get rid of: -- API: remove_from_db, read_header, read_db -- internal implementation Backend::prune_block
InMem: -- Blockchain::id (blockid -> hash)

Other usages

trait PowAlgorithm -- verify (this could be easily changed, no implementation on substrate change, breaking change for users)

Network tests utils

PeerClient -- has_state_at
Peer -- generate_blocks_at -- push_blocks_at -- push_blocks_at_without_informing_sync -- push_blocks_at_without_announcing -- generate_tx_blocks_at

utils

impl BlockNumberOrHash -- parse
pub fn check_block

Transaction pool:

As transaction verification is done via runtime calls, the API for submitting also uses BlockId. Since runtime API is reworked this should be easy doable.

trait ChainApi -- validate_transaction -- block_id_to_number -- block_id_to_hash
trait TransactionPool --submit_at --submit_one --submit_and_watch
impl Pool --submit_at --resubmit_at --submit_one --submit_and_watch --prune_known --prune --prune_tags --resolve_block_number --verify --verify_one
impl ValidatedPool --resubmit_pruned --fire_pruned --clear_stale
impl FullChainApi
impl BasicPool
impl TestApi fn validate_transaction( fn block_id_to_number( fn block_id_to_hash(
test::OffchainWorkers::submit_at

paritytech / polkadot-sdk