Open koute opened 2 years ago
This fixes the excessive CPU usage:
diff --git a/client/state-db/src/noncanonical.rs b/client/state-db/src/noncanonical.rs
index 13cf5825b..67167fcfc 100644
--- a/client/state-db/src/noncanonical.rs
+++ b/client/state-db/src/noncanonical.rs
@@ -35,10 +35,12 @@ const MAX_BLOCKS_PER_LEVEL: u64 = 32;
#[derive(parity_util_mem_derive::MallocSizeOf)]
pub struct NonCanonicalOverlay<BlockHash: Hash, Key: Hash> {
last_canonicalized: Option<(BlockHash, u64)>,
+ #[ignore_malloc_size_of = "disabled"]
levels: VecDeque<OverlayLevel<BlockHash, Key>>,
parents: HashMap<BlockHash, BlockHash>,
pending_canonicalizations: Vec<BlockHash>,
pending_insertions: Vec<BlockHash>,
+ #[ignore_malloc_size_of = "disabled"]
values: HashMap<Key, (u32, DBValue)>, // ref counted
// would be deleted but kept around because block is pinned, ref counted.
pinned: HashMap<BlockHash, u32>,
(But isn't really a proper fix since it just ignores those fields.)
With this the malloc_size
completely disappears from the profile and the bps increases from ~0.5bps to ~2.5bps, so still painfully slow, but at least a little faster.
We should probably remove info
from usage_info()
. From a first glance, usage_info().usage
is only used in one place in the entire code base, in the informant
. And even there we didn't need to call it most of the times, because the trace log is not enabled.
Nice find :)
So I'm investigating this issue https://github.com/paritytech/substrate/issues/11503 and as part of the investigation I had to sync a parachain node. I've noticed that after syncing a few hundred thousand blocks it got really slow, between 0~2 blocks per second. So I've fired up the profiler and I saw that the
usage_info
call (more specifically, themalloc_size
called fromStateDb::memory_info
) is consuming a whopping ~25% of CPU. Whether this is actually the culprit of the slow sync or not - I'm not sure, however burning 25% of the CPU on this seems quite excessive.Here are the four places where it's called:
cumulus/client/consensus/common/src/parachain_consensus.rs
UsageProvider::usage_info
only forchain.finalized_hash
, 5.1% CPU usageUsageProvider::usage_info
only forchain.best_hash
, 5.6% CPU usagesubstrate/client/service/src/metrics.rs
Backend::usage_info
, actually uses theusage
, 5.4% CPU usagesubstrate/client/informant/src/lib.rs
Backend::usage_info
, actually uses theusage
, 5.4% CPU usageThe
UsageProvider::usage_info()
returns aClientInfo<Block>
which is:The
Backend::usage_info
returns aOption<UsageInfo>
.The two places in Cumulus don't actually need to call this as they don't actually use this information (they only care about the
Info<Block>
and don't use the returnedUsageInfo
), so that's 10% of the CPU time completely wasted.I'm not yet sure why the underlying call to
malloc_size
consumes as much CPU time as it does; it might be a side effect of some more fundamental problem. Nevertheless the two calls incumulus
are at the very least completely useless and should not be made.