status-im / nimbus-eth2

Nim implementation of the Ethereum Beacon Chain
https://nimbus.guide
Other
543 stars 233 forks source link

SEGV on graceful shutdown #2233

Closed kdeme closed 3 years ago

kdeme commented 3 years ago
NOT 2021-01-13 11:54:43.275+01:00 Graceful shutdown                          topics="beacnde" tid=347646 file=nimbus_beacon_node.nim:825
DBG 2021-01-13 11:54:43.275+01:00 Closing discovery node                     topics="discv5" tid=347646 file=protocol.nim:921 node=<redacted>
...
DBG 2021-01-13 11:54:44.561+01:00 Block processed                            tid=347646 file=eth2_processor.nim:218 local_head_slot=402023 store_speed=4.1885 block_slot=402023 store_block_duration=228ms716us400ns update_head_duration=10ms34us265ns overall_duration=238ms750us665ns blockRoot=eab2adcf
DBG 2021-01-13 11:54:44.561+01:00 Exception in secure handler during incoming upgrade topics="libp2p switch" tid=347646 file=switch.nim:246 msg="Stream EOF!" conn=5ffed1734853672756e1f33d
 peers: 1 ❯ finalized: 99636f80:12561 ❯ head: 555eec3d:12563:2 ❯ time: 12589:25 (402873) ❯ sync: Dwwwwwwwww:1:0.7407:0.1235:01h59m (402016)                                                                                                                         ETH: 63.612520978 libbacktrace error: no debugging symbols available. Compile with '--debugger:native'.
Traceback (most recent call last, using override)
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

On dde973e2d432482414ac7565618f912a99feccc5 (unstable), but I haven't tested with stable, might occur there too.

First quick investigation indicates that it is in https://github.com/status-im/nimbus-eth2/blob/dde973e2d432482414ac7565618f912a99feccc5/beacon_chain/eth2_network.nim#L1173 More specifically when running https://github.com/status-im/nim-libp2p/blob/87be2c7f1f4a161a6a2fb2cccc20062c9595f01f/libp2p/transports/tcptransport.nim#L155. At least, commenting that out makes the SEGV go away...

The SEGV does not occur always. And especially not when Eth2Node.stop(): timeout reached, which would be cases where not all the shutdown code is run (for example, the transports closing).

There is an importance here on fixing this SEGV, as the databases are closed after the closing of the network. (or we just move or remove the network shutdown for now)

edit: Testing this with gdb shows that the segv happens in securedHandler.

kdeme commented 3 years ago

Testing this with only this commit https://github.com/status-im/nim-libp2p/pull/502/commits/d25ded77bc96eab7e6548dd779c1236eaeccdf61 fixes the segv

sinkingsugar commented 3 years ago

it's merged in unstable now