Open crystalin opened 1 year ago
Additionally, it took something like 40h to generate the moonbeam storage benchmark
@cheme do you have an idea why the ParityDB time for read on the 110M keys DB is so much slower than Rocks when it is normally faster?
You can download a recent Moonbeam state: https://s3.console.aws.amazon.com/s3/object/alan-stuff?region=us-east-1&prefix=moonbeam-state-3631095.json.lz4 (10GB) if you want to check it
That is definitely not expected. Could imagine worst access on big mmap memory, but not something in these proportions. Can also think of the data not being correctly build (there is a reindexing running in background every N values, but this is flushed on exit/start).
" based on substrate 0.9.40." : is it substrate version (looks old)? Would be interesting to have the parity-db version listed in the Cargo.lock (a version from a few month ago did have an issue that could explan some bad behavior cc\ @arkpar ).
Edit: https://github.com/PureStake/moonbeam/blob/6ed87ceeb65db27a9b2ce7ff32b90d062540bd67/Cargo.lock#L8942 parity db version is 0.4.6 which do not have https://github.com/paritytech/parity-db/pull/206 but I don't expect it to be related.
I'm happy to cherry-pick some changes on top of it if you want to test few things. You can also probably reproduce by using the snapshot I provided
I'm happy to cherry-pick some changes on top of it if you want to test few things. You can also probably reproduce by using the snapshot I provided
Would be using latest version of parity db (cargo update -p parity-db
), but then it only really would make sense if synching the snapshot from scratch.
Something I am thinking right now, did the memory cusumption stay correct during the process (looking at the bench code I suspect it could put many items in memory)?
Edit: just realize the snapshot is in json format so no need to resynch.
actually would be better patched parity-db master to include https://github.com/paritytech/parity-db/pull/211
Ok I'll try that if I find time (also be aware that the benchmark took 40 hours so I won't get result quickly)
I cant even import the snap on a 64GB server… do you use 128?
I've tried using warp sync on moonbeam. The sync went fine, although peak RAM usage was over 130GB. However the parachain is not finalizing blocks. Final block is still at zero. Is this a known issue? Unfinalized blocks are stored differently in the DB and this may affect performance.
As for possible performance issues, it could be affected by how the benchmark is implemented. RocksDB uses its own caching, while ParityDb relies on the OS cache. IIRC the benchmark warmup touches a few of the keys, and for RocksDB this causes a lot more data to be pre-cached.
@arkpar warp is not fully supported yet. We are still working on it. I also suspect the benchmark implementation is the reason for those unexpected values, but it is hard/long to verify
@arkpar were you able to reproduce? Let me know if I can help otherwise
I could not access the snapshot linked above. It requires AWS registration and asks for my credit card number. I've started regular sync instead and it looks like it will take 3-4 days.
@crystalin Could you give it a test with parity-db 0.4.10?
cargo update -p parity-db
should do it
I'm running it now. This time I looked at the CPU load and IO load, and during the benchmark:
If the DB benchmark time is a major problem then we could add a flag to only read 10% or 1% of the total keys (randomly selected). That way you would have some preliminary results for faster iterating. Do you think that would help?
That could make sense yes a % flag
Warmup round just finished, I might get result this WE (Also memory jump to (95%)
I was able to run it (with substrate 0.9.43 and paritydb 0.4.10). It took 3 days to finish:
pub const ParityDbWeight: RuntimeDbWeight = RuntimeDbWeight {
read: 182_722 * constants::WEIGHT_REF_TIME_PER_NANOS,
write: 60_176 * constants::WEIGHT_REF_TIME_PER_NANOS,
};
(No improvement at all)
I did check a bit more how to switch the chainspec loading to something that do not load all state in memory, but it is a bit more work than I did expect (break a lot the genesis build api since we need to do multiple commit while using a streaming json parser), so I postpone doing this myself for now. Still I got a better understanding of the benchmarking process and it just use the standard chainspec loading, which means that the full state is send in parity db but the bench run on a db that just got a lot of key injected. So the db may still be doing one or two levels of table reindexing when doing its benchmark, which would explain the performance issue.
This can be check by doing "ls" on the db directory and looking at the file for the state column:
if it is still reindexing the state there will multiple file named paritydb/full/index_01_xx
with xx being the index sizes.
If this is the case I do not have of a simple way of ensuring reindexing (changing default index size to paritydb can be a a hacky solution).
The following change in substrate would allow flushing the logs but would not force all reindexing to finish.
--- a/bin/node/cli/src/command.rs
+++ b/bin/node/cli/src/command.rs
@@ -127,6 +127,8 @@ pub fn run() -> Result<()> {
),
#[cfg(feature = "runtime-benchmarks")]
BenchmarkCmd::Storage(cmd) => {
+ // load once first to ensure db is flushed.
+ new_partial(&config)?;
// ensure that we keep the task manager alive
let partial = new_partial(&config)?;
let db = partial.backend.expose_db();
but it would need to keep db open for a while until everything is reindex too.
Maybe simply doing the bench in two steps:
Or implement a primitive that ensure all reindexing is finished in paritydb and use it before calling new_partial a second time (but it will not be very elegant as the code at this level do not assume a specific db).
Thank you,
I think we did run the node with no connection (we often do for other profiling parts) before running the benchmark, but I can try again to see if that helps.
I think having substrate support the storage benchmark on a substate of the state would probably be more effective in that case.
Yes I hope to get https://github.com/paritytech/polkadot-sdk/issues/146 to some newcomer to solve. Forwarded it to a PBA student now.
Outside of the storage benchmark, the performances of paritydb are also generally worse than rocksdb when the state is large (100M+ keys) and doing archive (I don't know how to measure to total number of keys in the db itself):
ParityDb:
2023-09-12T15:34:57.659Z utils:storage-query Queried 55384 keys @ 2769 keys/sec, 34 MB heap used
2023-09-12T15:35:02.659Z utils:storage-query Queried 82671 keys @ 3307 keys/sec, 46 MB heap used
2023-09-12T15:35:07.659Z utils:storage-query Queried 103743 keys @ 3458 keys/sec, 27 MB heap used
2023-09-12T15:35:12.659Z utils:storage-query Queried 130776 keys @ 3736 keys/sec, 21 MB heap used
2023-09-12T15:35:17.659Z utils:storage-query Queried 159459 keys @ 3986 keys/sec, 33 MB heap used
2023-09-12T15:35:22.659Z utils:storage-query Queried 184760 keys @ 4106 keys/sec, 18 MB heap used
RocksDb:
2023-09-12T15:36:44.978Z utils:storage-query Queried 520850 keys @ 17358 keys/sec, 30 MB heap used
2023-09-12T15:36:49.978Z utils:storage-query Queried 638850 keys @ 18249 keys/sec, 17 MB heap used
2023-09-12T15:36:54.979Z utils:storage-query Queried 784850 keys @ 19618 keys/sec, 15 MB heap used
2023-09-12T15:36:59.979Z utils:storage-query Queried 894850 keys @ 19882 keys/sec, 20 MB heap used
2023-09-12T15:37:04.981Z utils:storage-query Queried 975850 keys @ 19514 keys/sec, 24 MB heap used
Running Strorage Benchmark on 3 different networks with significant state size/content results in incoherent results. We have beeen using Moonbeam v0.32.1 which is based on substrate 0.9.40. The network
Alphanet
andMoonriver
have similar state/usage overall, but Moonbeam had a project that generate a huge amount of storage (all of the same size, 42 bytes IIRC).As you can see, the Moonbeam
read
andwrite
using paritydb are way off the expected result that we see inalphanet
andmoonriver
.Configuration of the disk is AWS
gp3 | 1000 GiB | 3000 IOPS
and each network/db has its own disk (total of 6 disks). The blocks and state are pruned to avoid having a huge disk space.Running the storage benchmark (on c6i.4xlarge AWS):
for each chain
Alphanet (~20M keys):
Moonriver (~30M keys state):
Moonbeam (~110M keys state):
Additionally to the paritydb numbers, we can also see that RocksDB average read is 50% on Moonbeam (110M keys) than Moonriver (30M keys), which might be related to the size of the data on Moonbeam being on average smaller than on Moonriver.
Details about the Benchmark output can be found there: https://gist.github.com/crystalin/8e790a554b246e077c83ad04c04f330c