This is to track the issue of peer leaks and CPU spikes in some Hyperbahn nodes.
During the on-call, I observed a spike of CPU and connection FDs in some Hyperbahn nodes. The memory dumps suggested that we are leaking TChannelPeer (130k objects).
The suspicion is that it is related to the aggressive outgoing peer selection logic. However, the amount of the leaked connections doesn't add up to the leaked peers. One reason may be when a non-ephemeral connection is closed, the peer never gets deleted ...
I would consider this issue open until we have more clarity on the cause.
This is to track the issue of peer leaks and CPU spikes in some Hyperbahn nodes.
During the on-call, I observed a spike of CPU and connection FDs in some Hyperbahn nodes. The memory dumps suggested that we are leaking TChannelPeer (130k objects).
The suspicion is that it is related to the aggressive outgoing peer selection logic. However, the amount of the leaked connections doesn't add up to the leaked peers. One reason may be when a non-ephemeral connection is closed, the peer never gets deleted ...
I would consider this issue open until we have more clarity on the cause.
cc: @jcorbin @Raynos @kriskowal