Closed corpetty closed 3 years ago
This issue means we are leaking sockets, but this issue has 2 inside
poll()
.This exception should not be raised by poll().
It's either that or reaching the maximum number of open file descriptors stays hidden. Which is better?
@stefantalpalaru nope it will be raised by appropriate procedure which creates sockets, for example connect()
should return this error, but exactly not poll
So all exception which are raised by poll are bugs.
CC @sinkingsugar we are leaking FDs
Nevermind, its chronos
problem i'm working on fix.
Additional segfault, when lowering the open file descriptor limit further:
prlimit -n50 make SCRIPT_PARAMS="--skipGoerliKey" witti
ERR 2020-06-05 15:17:17+02:00 Transport getMessage error topics="discv5" tid=23736 file=protocol.nim:413 exception=TransportOsError msg="(11) Resource temporarily unavailable"
peers: 6 ❯ epoch: 2149, slot: 18/32 (68786) ❯ finalized epoch: 2 (00247c0b) ETH: 0 Traceback (most recent call last, using override)
/mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(614) signalHandler
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
socket(unix): Too many open files
socket(unix): Too many open files
socket(unix): Too many open files
socket(unix): Too many open files
socket: Too many open files
DBG 2020-06-05 15:17:17+02:00 UPnP topics="nat" tid=23736 file=nat.nim:48 msg="Miniupnpc Socket error"
peers: 6 ❯ epoch: 2149, slot: 18/32 (68786) ❯ finalized epoch: 2 (00247c0b) ETH: 0
Same command as above, but with --nat=none
added to the beacon_node command line in "scripts/connect_to_testnet.nims", allowed it to live long enough to finalise 25 epochs. It still died with:
DBG 2020-06-05 15:35:10+02:00 Exception in poll() topics="beacnde" tid=1981 file=beacon_node.nim:718 err="(24) Too many open files" exc=TransportOsError
ERR 2020-06-05 15:35:10+02:00 Transport getMessage error topics="discv5" tid=1981 file=protocol.nim:413 exception=TransportOsError msg="(11) Resource temporarily unavailable"
peers: 9 ❯ epoch: 2152, slot: 11/32 (68875) ❯ finalized epoch: 25 (1ec0174a) ETH: 0 Traceback (most recent call last, using override)
/mnt/sda3/storage/CODE/status/nim-beacon-chain-clean/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(614) signalHandler
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Later edit: the port redirections were probably still there, since they were not deleted at the end of the last run, since the UPnP client could not open a new socket to the router.
FD leaks was introduced because of https://github.com/status-im/nim-chronos/commit/d6d0084333b5d6d91b2d710b7c0542d2cd8c4c6f and fixed in https://github.com/status-im/nim-chronos/commit/bedd1ded5edc3bfb6877f7025ca4b21f62492ffe .
So part 2 of this issue was fixed, part 1 fixes are pending.
@corpetty this issue is not a blocker for you anymore, but i will close it only after i will introduce fixes for part 1.
The first part (no exception in poll) will be handled by https://github.com/status-im/nim-libp2p/pull/384 instead of https://github.com/status-im/nim-libp2p/pull/247
https://github.com/status-im/nim-libp2p/pull/384 was merged, and picked up by nimbus-eth2
.
after about 4-5 hrs of uptime, i get the following ad-nausium
unfortunately, I was not able to see where it started as I was afk and came back to it.