Open shunsukew opened 2 years ago
Did you also changed --pool-kbytes
?
@bkchr Thank you for the comment. No, I don't. That means default value is used?
--pool-kbytes <COUNT>
Maximum number of kilobytes of all transactions stored in the pool [default: 20480]
@koute could you may look into this?
@koute could you may look into this?
Sure; I'm on it.
The issue doesn't seem to reproduce on a normal Kusama node (or maybe it just needs to be sync'd from scratch; I haven't checked yet), however I think I've managed to reproduce it on the newest astar-collator
(I haven't let it run until memory exhaustion, but it looks like the memory's growing). I'm profiling it to see why it is growing.
@shunsukew For reference, can you provide the exact command line you've used to launch your node?
So I think I see the memory usage increase, but it's nowhere near as fast as on the screenshots posted by @shunsukew. I'll leave it running overnight (and if it doesn't reproduce I'll try maybe spamming it with fake transactions), however it'd be nice if there was a way I could reproduce it to behave as in the original issue as that would make it a lot easier to investigate.
In the meantime I've also noticed that the Astar node uses the system allocator and doesn't use jemalloc like Polkadot does; this is not good, and it might contribute to the problem. (I could check if I knew how to exactly reproduce it.) I've put up a PR here enabling jemalloc for your node: https://github.com/AstarNetwork/Astar/pull/653
Hi @koute thank you very much for the PR!
Below are tests made on a collator node with this simple command (before and after change made ~19:15):
/usr/local/bin/astar-collator --collator --rpc-cors all --name collator --base-path /var/lib/astar --state-cache-size 0 --prometheus-external --pool-limit 65536 --port 30333 --chain astar --parachain-id 2006 --telemetry-url 'wss://telemetry.polkadot.io/submit/ 0'
I think node has to be fully sync to reproduce.
Previous data reported was from a public node (archive mode).
Metrics on the same time frame
Transaction queue
RAM (32Gb total) increases fast but doesn't get totally full from the beginning:
CPU consumption doesn't change much but gets higher
Peers number gets unstable
Network traffic increases in huge proportions, the node is sending incredible amount of data
I will test your PR just after as a next step.
@koute @bLd75 Thank you for the PR and additional information
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
Substrate node with large transaction pool limit configuration (for e.g.
--pool-limit 65536
. larger than default pool limit.) consumes whole Mem (32GB) of the machine when pooled transactions count hits around 20k. Memory usage grows rapidly and reaches 100% of 32GB memory.Are there any potential issues around Transaction Pool? such as memory leak.
Case 1. Transaction Pool 20k (2022-05-22 21:50:00 ~ 2022-05-22 23:00:00 UTC +8) Transaction pool
Mem
CPU
Once memory usage hits 100%, machine will become not reachable.
Case 2. Default Transaction Pool Limit (2022-05-22 23:20:00 ~ 2022-05-22 23:40:00 UTC +8) Transaction Pool
Mem
CPU
(Machine Spec) CPU optimized machine (Fast CPU) 16 vCPU 32GB Mem General Purpose SSD - 16KiB IOPS & throughput 250 MiB/s
Steps to reproduce
Set default pool limit
--pool-limit
more than 20k and have +19k transactions in the pool. (I did this by running Astar node and sync blocks with peers as of 2022/05/23.)