Open broadbentg opened 1 year ago
The rate of connection drops seems to be much higher when backfilling states using --reconstruct-historic-states. Perhaps because of increased net traffic?
The rate of connection drops seems to be much higher when backfilling states using --reconstruct-historic-states
There's no network traffic required for reconstructing historic states, but will impose some disk and CPU load
Description
Events are monitored with this API: https://ethereum.github.io/beacon-APIs/?urls.primaryName=v1#/Events
A connection is established and events are monitored. However, after some time ranging from 48 minutes to 26 hours, Lighthouse terminates the connection. For example:
failed after a little more than two hours. A packet capture with tcpdump shows that Lighthouse terminates the connection:
This has been seen repeatedly (including packet captures) with curl, some C test code, and Python. In each case Gnosis Chain was used. Several times three test processes were run simultaneously. They are not cut off at the same time, it appears to happen randomly. No related log file messages were found.
Version
The problem has been seen with two pre-built binary x86 versions: Lighthouse v3.3.0-bf533c8 BLS library: blst-portable SHA256 hardware acceleration: false Specs: mainnet (true), minimal (false), gnosis (true)
Lighthouse v3.4.0-38514c0 BLS library: blst-modern SHA256 hardware acceleration: false Specs: mainnet (true), minimal (false), gnosis (true)
The problem is seen under Ubuntu on x86_64 virtual hardware: Distributor ID: Ubuntu Description: Ubuntu 22.10 Release: 22.10 Codename: kinetic
Present Behaviour
Event monitor TCP connections do not stay up indefinitely.
Expected Behaviour
Event monitor TCP connections should stay up until the client terminates them.
Steps to resolve
The simple work around is to simply re-open the TCP connection and continue monitoring events. However, this could result in an event being missed.