Open alexggh opened 3 weeks ago
Thanks Alex for raising this! 🙏
We also had this issue with addresses, although the memory consumed here is in the order of GiB (https://github.com/paritytech/polkadot-sdk/pull/5998).
There are a few places that come to mind where to look at next:
I would start by looking at litep2p and then move to substrate code
Initial testing with debug logs built on top of branch lexnv/holistic-litep2p-test-dhtandpeerset
shows memory leaks in litep2p:
Will let my node running over night and followup with patches
We need metrics to filter out potential leaks (ie monotonically increasing state tracking is a concern).
We have around 3 separate leaks:
For more details and explained edge-cases when the leaks happen see:
Lower severity memory leaks in the ping and identify protocols:
Looking over the dashboards on our kusama validators the memory on node that is running litep2p seems to be constantly increasing it is now at 12GiB, all other nodes are around 3-4 GiB and constant.
https://grafana.teleport.parity.io/goto/Uoh4CQmHg?orgId=1
cc: @paritytech/networking