Open OlegMazurov opened 6 months ago
Two approaches:
An OOM with these symptoms was observed in a performance network when testing v0.51.2
.
Relevant log messages from the teacher node:
2024-06-22 00:12:40.462 292643 INFO RECONNECT <<platform-core: SyncProtocolWith1 3 to 1>> ReconnectTeacher: Starting reconnect in the role of the sender {"receiving":false,"nodeId":3,"otherNodeId":1,"round":302201} [com.swirlds.logging.legacy.payload.ReconnectStartPayload]
...
2024-06-22 01:03:38.788 301244 INFO RECONNECT <<platform-core: SyncProtocolWith1 3 to 1>> TeachingSynchronizer: sending tree rooted at com.swirlds.virtualmap.internal.merkle.VirtualRootNode with route [0 -> 32 -> 1] -- last RECONNECT log message
...
2024-06-22 01:52:55.573 308662 ERROR EXCEPTION <platformForkJoinThread-9> PlatformBuilder: Uncaught exception on thread Thread[#250,platformForkJoinThread-9,5,platform]: java.lang.OutOfMemoryError: Java heap space
Description
The reconnect connection gets broken due to a problem on the learner side. After a while, the teacher dies with OOM.
Steps to reproduce
The issue was observed with a single-node mode reconnect testing framework. It needs to be investigated further to see if the issue may affect networks.
Additional context
The OOM is due to not releasing an FCQueue for expirable transaction records. See also #11364
Hedera network
other
Version
v0.47.0-SNAPSHOT
Operating system
Linux