paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.87k stars 683 forks source link

High CPU usage calling `Backend::usage_info` #40

Open koute opened 2 years ago

koute commented 2 years ago

So I'm investigating this issue https://github.com/paritytech/substrate/issues/11503 and as part of the investigation I had to sync a parachain node. I've noticed that after syncing a few hundred thousand blocks it got really slow, between 0~2 blocks per second. So I've fired up the profiler and I saw that the usage_info call (more specifically, the malloc_size called from StateDb::memory_info) is consuming a whopping ~25% of CPU. Whether this is actually the culprit of the slow sync or not - I'm not sure, however burning 25% of the CPU on this seems quite excessive.

Here are the four places where it's called:

The UsageProvider::usage_info() returns a ClientInfo<Block> which is:

/// Client info
#[derive(Debug)]
pub struct ClientInfo<Block: BlockT> {
    /// Best block hash.
    pub chain: Info<Block>,
    /// Usage info, if backend supports this.
    pub usage: Option<UsageInfo>,
}

The Backend::usage_info returns a Option<UsageInfo>.

The two places in Cumulus don't actually need to call this as they don't actually use this information (they only care about the Info<Block> and don't use the returned UsageInfo), so that's 10% of the CPU time completely wasted.

I'm not yet sure why the underlying call to malloc_size consumes as much CPU time as it does; it might be a side effect of some more fundamental problem. Nevertheless the two calls in cumulus are at the very least completely useless and should not be made.

koute commented 2 years ago

This fixes the excessive CPU usage:

diff --git a/client/state-db/src/noncanonical.rs b/client/state-db/src/noncanonical.rs
index 13cf5825b..67167fcfc 100644
--- a/client/state-db/src/noncanonical.rs
+++ b/client/state-db/src/noncanonical.rs
@@ -35,10 +35,12 @@ const MAX_BLOCKS_PER_LEVEL: u64 = 32;
 #[derive(parity_util_mem_derive::MallocSizeOf)]
 pub struct NonCanonicalOverlay<BlockHash: Hash, Key: Hash> {
        last_canonicalized: Option<(BlockHash, u64)>,
+       #[ignore_malloc_size_of = "disabled"]
        levels: VecDeque<OverlayLevel<BlockHash, Key>>,
        parents: HashMap<BlockHash, BlockHash>,
        pending_canonicalizations: Vec<BlockHash>,
        pending_insertions: Vec<BlockHash>,
+       #[ignore_malloc_size_of = "disabled"]
        values: HashMap<Key, (u32, DBValue)>, // ref counted
        // would be deleted but kept around because block is pinned, ref counted.
        pinned: HashMap<BlockHash, u32>,

(But isn't really a proper fix since it just ignores those fields.)

With this the malloc_size completely disappears from the profile and the bps increases from ~0.5bps to ~2.5bps, so still painfully slow, but at least a little faster.

bkchr commented 2 years ago

We should probably remove info from usage_info(). From a first glance, usage_info().usage is only used in one place in the entire code base, in the informant. And even there we didn't need to call it most of the times, because the trace log is not enabled.

Nice find :)