Implement caching of RPC responses

reilabs / starknet-replay

CLI tool to replay Starknet transactions and profile libfuncs usage.

Apache License 2.0

0 stars 0 forks source link

The move from Pathfinder database to RPC protocol for querying Starknet data slowed down the replayer.

An optimisation to speed up the replayer is to cache RPC responses due to some data being queried multiple times during transaction replay.

These endpoints return a single value:

starknet_getClassHashAt
starknet_getStorageAt
starknet_chainId

These endpoints returns a complex object:

starknet_getClass
starknet_getBlockWithTxHashes

The following endpoints are not worth caching because the result is used only once:

starknet_getNonce
starknet_getBlockWithReceipts

An appropriate cache takes into consideration:

The cache size
The eviction policy once the cache is full

starknet_getClass could use a LFU policy because a small percentage of classes uses in Starknet is the most frequently called. starknet_getBlockWithReceipts can have a count that after 3 calls the data is evicted because the block header is called only three times. An alternative is to structure the code such that this endpoint is called only once. starknet_getClassHashAt and starknet_getStorageAt can also use an LFU policy. starknet_chainId always returns the same value for a whole transaction replay.

Because of the different objects to be cached, it could be optimal to have multiple caches: one for each endpoint.

Investigate if there is a crate already suitable for this task without rolling our own cache. Moka can be a good candidate to test.

[x] Replay a couple blocks to confirm which requests are called multiple times
[x] Design a cache strategy
[ ] Implement the cache using existing crates if possible

These are all the rpc client endpoints:

starknet_block_number
starknet_get_class
starknet_get_block_with_tx_hashes
starknet_get_block_with_receipts
starknet_get_nonce
starknet_get_class_hash_at
starknet_get_storage_at
starknet_get_chain_id

Here is the analysis of cache for each endpoint:

starknet_get_chain_id result can be called once and the result is valid for the whole replay.
starknet_block_number result returns the most recent block number and the result can be valid the whole replay. It can be assumed that, even if a couple of new blocks are created during the replay, the replayer would need to be re-executed.
starknet_get_block_with_tx_hashes is called to retrieve the hash of the previous block.
starknet_get_block_with_receipts is only called once for different blocks.
The remaining endpoints are dependent on the block_number. When block x is executed, then the data is queried at the status of block x-1. This implies that if data is available at block x-2, it's not ensured it doesn't change at block x-1.

For this reason, caching RPC responses during parallel block replay is unlikely to be hit because it requires:

previous block replay already completed
a transaction is querying data calculated in the previous block

Existing replayer is performing at 7s/block with RPC server in localhost and 34s/block with blastapi.io server with 8 threads. This is meeting requirements.

However, for future exploration, one strategy is to replay blocks in groups of even numbers followed by odd numbers, this ensures that the second group is replayed with the previous block being already replayed.

reilabs / starknet-replay

Implement caching of RPC responses #38