Open Dinonard opened 2 months ago
Hey, This is basically a duplicate of: https://github.com/paritytech/polkadot-sdk/issues/4
The problem is that right now we keep the entire state in memory while downloading. For chains that have a big state (not sure what the state size of Astar is), this can lead to OOM on machines with not enough main memory.
Hey @bkchr, I'm not sure this is the same as the issue you linked - the memory consumption suddenly spikes, it doesn't grow slowly over the time.
Please check the image here. There are huge spikes that happen abruptly.
Hmm yeah. I still think it is related to having everything in memory.
Did you try to run the node with gdb? To get some stacktrace when it OOMs?
We haven't but I'll ask our devops team to do that and I'll post the traces here.
Ty.
Does this help? threads.log
We've used this command to generate it:
sudo gdb --batch --quiet -ex "thread apply all bt full" -p <...>
@Dinonard but this is not from the point when OOMs. You need to run the node with gdb all the time attached.
TBH I checked the logs myself and couldn't see anything wrong, but AFAIK this was run immediately when the node started. Let me get back to you.
I've run it on the same server as before with gdb now properly encapsulating the entire service.
And it's weird:
Thread 12 "tokio-runtime-w" received signal SIGPIPE, Broken pipe.
Sorry about the missing symbols but I haven't used GDB with any Rust app before.
I figured that building with debug flag (-g
) would include the debug info (and symbols?) into the binary itself but it doesn't seem to have helped.
Just to share my experiences on the state sync with a chain having a huge state. The subcoin node crashed the first time when importing the downloaded state at a certain height. And then I reran it with the same command syncing the state at the same height, with gdb attached. Unfortunately (or fortunately :P), it didn't crash with gdb. I observed that this successful state importing almost absorbed my entire memory (my machine has 128GiB of memory) at the peak.
2024-08-19 16:35:01.747 INFO tokio-runtime-worker sync: State sync is complete (5438 MiB), restarting block sync.
diff --git a/substrate/primitives/trie/src/lib.rs b/substrate/primitives/trie/src/lib.rs
index ef6b6a5743..e0a2cf3b30 100644
--- a/substrate/primitives/trie/src/lib.rs
+++ b/substrate/primitives/trie/src/lib.rs
@@ -296,23 +296,30 @@ where
V: Borrow<[u8]>,
DB: hash_db::HashDB<L::Hash, trie_db::DBValue>,
{
- {
+ // {
let mut trie = TrieDBMutBuilder::<L>::from_existing(db, &mut root)
.with_optional_cache(cache)
.with_optional_recorder(recorder)
.build();
+ tracing::info!("====================== Collecting delta");
let mut delta = delta.into_iter().collect::<Vec<_>>();
+ tracing::info!("====================== Finished Collecting delta: {}", delta.len());
delta.sort_by(|l, r| l.0.borrow().cmp(r.0.borrow()));
+ tracing::info!("====================== Sorted delta");
- for (key, change) in delta {
+ tracing::info!("====================== Starting to write trie, mem usage: {:.2?}GiB", memory_stats::memory_stats().map(|usage| usage.physical_mem as f64 / 1024.0 / 1024.0 / 1024.0));
+ for (index, (key, change)) in delta.into_iter().enumerate() {
match change.borrow() {
Some(val) => trie.insert(key.borrow(), val.borrow())?,
None => trie.remove(key.borrow())?,
};
}
- }
+ tracing::info!("====================== Finished writing delta to trie, mem usage: {:.2?}GiB", memory_stats::memory_stats().map(|usage| usage.physical_mem as f64 / 1024.0 / 1024.0 / 1024.0));
+ drop(trie);
+ // }
+ tracing::info!("====================== End of delta_trie_root, mem usage: {:.2?}GiB", memory_stats::memory_stats().map(|usage| usage.physical_mem as f64 / 1024.0 / 1024.0 / 1024.0));
Ok(root)
}
2024-08-29 10:52:53 ====================== Collecting delta
2024-08-29 10:52:55 ====================== Finished Collecting delta: 104674943
2024-08-29 10:52:56 ====================== Sorted delta
2024-08-29 10:52:57 ====================== Starting to write trie, mem usage: Some(26.23)GiB
2024-08-29 10:54:09 ====================== Finished writing delta to trie, mem usage: Some(76.36)GiB
2024-08-29 11:00:28 ====================== End of delta_trie_root, mem usage: Some(90.53)GiB
I added some logging for the memory usage in the block import pipeline. It turned out that https://github.com/subcoin-project/polkadot-sdk/blob/13ca1b64692b05b699f49e729d0522ed4be730b9/substrate/primitives/trie/src/lib.rs#L285 is the culprit. The memory usage surged from 26 GiB to 76 GiB after the trie was built. Importing the same state does not always succeed, if it crashes due to OOM, Finished writing delta to trie...
would be printed and then it crashes, End of delta_trie_root
won't be printed. Any clue for the further investigation and potential fix? @bkchr
Constructing the entire trie from the state at a specific height in memory seems to be the primary cause of the OOM issue. This is a critical design flaw, in my opinion, especially since the chain state will continue to grow over time. While Polkadot may be fine for now, it will inevitably face the same problem in the long run if we don't address this. Please prioritize pushing this issue forward. @bkchr
Constructing the entire trie from the state at a specific height in memory seems to be the primary cause of the OOM issue.
Yeah sounds reasonable. I need to think about on how to improve this, but yeah this should not happen ;)
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
As the title suggest, using warp sync on parachain causes OOM, crashing the client.
We've had this problem on Astar for a few months, and have recently uplifted to
polkadot-sdk
versionv1.9.0
but are still seeing the problem. There are no outstanding traces in the log, it just explodes at some point.There's an issue opened in our repo, https://github.com/AstarNetwork/Astar/issues/1110, with steps to reproduce as well as images of resource consumption before the crash.
We haven't been able to find similar issues or discussion related to the topic.
Steps to reproduce
Run latest Astar node as described in the linked issue.