status-im / nimbus-eth2

Nim implementation of the Ethereum Beacon Chain
https://nimbus.guide
Other
513 stars 222 forks source link

nimbus spontanously crashes with "database disk image is malformed" #6425

Open marmarek opened 1 month ago

marmarek commented 1 month ago

Describe the bug

After about a month of uptime, Nimbus beacon node crashed and refuses to start anymore complaining database is malformed. This happened on two separate hosts about 1h apart.

To Reproduce Steps to reproduce the behavior:

  1. Platform details (OS, architecture): Linux amd64, Debian 12, but with vanilla kernel 6.6.31
  2. Branch/commit used: one instance was 24.5.1 (running for a long time before), the other one was 24.6.0 (first start after update)
  3. Commands being executed: nothing on the beacon node, but it was shortly after restarting validator client (a separate process, not sharing datadir)
  4. Relevant log lines:
crash message from 24.5.1

``` [2024-07-15 00:10:38] [2895575.688327] nimbus_beacon_node[816]: INF 2024-07-15 00:10:38.170+00:00 Database checkpointed topics="beacnde" dur=5s30ms452us652ns [2024-07-15 00:10:38] [2895575.688629] nimbus_beacon_node[816]: INF 2024-07-15 00:10:38.170+00:00 Slot end topics="beacnde" slot=9514850 nextActionWait= nextAttestationSlot=9514851 nextProposalSlot=-1 syncCommitteeDuties=none head=d2399b5f:9514850 [2024-07-15 00:10:38] [2895575.692065] nimbus_beacon_node[816]: INF 2024-07-15 00:10:38.174+00:00 Missed multiple heartbeats topics="libp2p gossipsub" heartbeat=GossipSub delay=4s102ms91us48ns hinterval=700ms [2024-07-15 00:10:38] [2895575.706602] nimbus_beacon_node[816]: INF 2024-07-15 00:10:38.188+00:00 Slot start topics="beacnde" head=d2399b5f:9514850 delay=3s188ms688us570ns finalized=297337:f5b6ecaa peers=159 slot=9514851 sync=synced epoch=297339 [2024-07-15 00:10:42] [2895579.923160] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests [2024-07-15 00:10:42] [2895579.923291] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2393) _ZN18nimbus_beacon_node4mainE [2024-07-15 00:10:42] [2895579.923363] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2316) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE [2024-07-15 00:10:42] [2895579.923435] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2217) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE [2024-07-15 00:10:42] [2895579.923508] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1979) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-15 00:10:42] [2895579.923579] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1926) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-15 00:10:42] [2895579.923649] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE [2024-07-15 00:10:42] [2895579.923720] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-15 00:10:42] [2895579.923789] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/gossip_processing/block_processor.nim(593) _ZN10storeBlock55storeBlock [2024-07-15 00:10:42] [2895579.923854] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/block_clearance.nim(269) _ZN22addHeadBlockWithParent22addHeadBlockWithParentE3refIN17block_pools_types27ChainDAGRefcolonObjectType_EE3varIN16signatures_batch13BatchVerifierEEN5deneb17SignedBeaconBlockE3refIN9block_dag24BlockRefcolonObjectType_EE4bool4procI3refIN9block_dag24BlockRefcolonObjectType_EEN5deneb24TrustedSignedBeaconBlockE3refIN17block_pools_types24EpochRefcolonObjectType_EEN7helpers19FinalityCheckpointsEE.constprop.0 [2024-07-15 00:10:42] [2895579.923950] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/blockchain_dag.nim(1800) _ZN14blockchain_dag11updateStateE3refIN17block_pools_types27ChainDAGRefcolonObjectType_EE3varIN5forks23ForkedHashedBeaconStateEEN8block_id11BlockSlotIdE4bool3varIN4base10StateCacheEE [2024-07-15 00:10:42] [2895579.924049] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/blockchain_dag.nim(726) _ZN14blockchain_dag8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EEN7presets13RuntimeConfigE7MDigestI6staticI3intEEN9constants4SlotE3varIN5forks23ForkedHashedBeaconStateEE4procIE [2024-07-15 00:10:42] [2895579.924127] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(1303) _ZN15beacon_chain_db8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EEN5forks13ConsensusForkE7MDigestI6staticI3intEE3varIN5forks23ForkedHashedBeaconStateEE4procIE [2024-07-15 00:10:42] [2895579.924205] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(1291) _ZN8getState8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EE7MDigestI6staticI3intEE3varIN5deneb11BeaconStateEE4procIE [2024-07-15 00:10:42] [2895579.924276] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(1208) _ZN29getStateOnlyMutableValidators29getStateOnlyMutableValidatorsE9openArrayIN4base23ImmutableValidatorData2EE3refIN7kvstore26KvStoreRefcolonObjectType_EE9openArrayI5uInt8E3varIN5deneb11BeaconStateEE4procIE [2024-07-15 00:10:42] [2895579.924352] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(743) _ZN8getSZSSZ8getSZSSZE3refIN7kvstore26KvStoreRefcolonObjectType_EE9openArrayI5uInt8E3varIN25beacon_chain_db_immutable37DenebBeaconStateNoImmutableValidatorsEE [2024-07-15 00:10:42] [2895579.924430] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(872) _ZN6expect6expectE6ResultI4bool6stringE6string.constprop.0 [2024-07-15 00:10:42] [2895579.924506] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(376) _ZN17raiseResultDefect17raiseResultDefectE6string6string [2024-07-15 00:10:42] [2895579.924579] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(329) _ZN6system18rawWriteStackTraceE3varI3seqIN6system15StackTraceEntryEEE [2024-07-15 00:10:42] [2895579.924657] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE [2024-07-15 00:10:42] [2895579.924727] nimbus_beacon_node[816]: [[reraised from: [2024-07-15 00:10:42] [2895579.924800] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests [2024-07-15 00:10:42] [2895579.924868] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2393) _ZN18nimbus_beacon_node4mainE [2024-07-15 00:10:42] [2895579.924930] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2316) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE [2024-07-15 00:10:42] [2895579.924992] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2217) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE [2024-07-15 00:10:42] [2895579.925072] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1979) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-15 00:10:42] [2895579.925135] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1926) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-15 00:10:42] [2895579.925196] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE [2024-07-15 00:10:42] [2895579.925260] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-15 00:10:42] [2895579.925321] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/gossip_processing/block_processor.nim(416) _ZN10storeBlock55storeBlock [2024-07-15 00:10:42] [2895579.925391] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE [2024-07-15 00:10:42] [2895579.925453] nimbus_beacon_node[816]: ]] [2024-07-15 00:10:42] [2895579.925529] nimbus_beacon_node[816]: [[reraised from: [2024-07-15 00:10:42] [2895579.925591] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests [2024-07-15 00:10:42] [2895579.925652] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2393) _ZN18nimbus_beacon_node4mainE [2024-07-15 00:10:43] [2895580.536214] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2316) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE [2024-07-15 00:10:43] [2895580.536521] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2217) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE [2024-07-15 00:10:43] [2895580.536686] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1979) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-15 00:10:43] [2895580.536873] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1926) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-15 00:10:43] [2895580.537089] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE [2024-07-15 00:10:43] [2895580.537240] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-15 00:10:43] [2895580.537577] nimbus_beacon_node[816]: /home/user/nimbus-eth2/beacon_chain/gossip_processing/block_processor.nim(416) _ZN10storeBlock55storeBlock [2024-07-15 00:10:43] [2895580.537750] nimbus_beacon_node[816]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE [2024-07-15 00:10:43] [2895580.537943] nimbus_beacon_node[816]: ]] [2024-07-15 00:10:43] [2895580.538129] nimbus_beacon_node[816]: Error: unhandled exception: working database (disk broken/full?): database disk image is malformed [ResultDefect] [2024-07-15 00:10:43] [2895581.018401] systemd[1]: nimbus_beacon_node.service: Main process exited, code=exited, status=1/FAILURE [2024-07-15 00:10:43] [2895581.018822] systemd[1]: nimbus_beacon_node.service: Failed with result 'exit-code'. [2024-07-15 00:10:43] [2895581.019117] systemd[1]: nimbus_beacon_node.service: Consumed 1w 16h 56min 45.200s CPU time. [2024-07-15 00:10:43] [2895581.324030] systemd[1]: nimbus_beacon_node.service: Scheduled restart job, restart counter is at 1. [2024-07-15 00:10:43] [2895581.324302] systemd[1]: Stopped nimbus_beacon_node.service - Nimbus Beacon Node (Ethereum consensus client). [2024-07-15 00:10:43] [2895581.324681] systemd[1]: nimbus_beacon_node.service: Consumed 1w 16h 56min 45.200s CPU time. ```

logs from 24.6.0

``` [2024-07-14 23:07:48] [ 23.735591] nimbus_beacon_node[1115]: INF 2024-07-14 23:07:48.265+00:00 Loading block DAG from database topics="beacnde" path=/home/nimbus/shared_mainnet_0/db [ 25.489865] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests [2024-07-14 23:07:50] [ 25.490017] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE [2024-07-14 23:07:50] [ 25.490132] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-14 23:07:50] [ 25.490227] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2241) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE [2024-07-14 23:07:50] [ 25.490352] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(552) _ZN4init4initE8typeDescI3refIN11beacon_node26BeaconNodecolonObjectType_EEE3refIN12bearssl_rand15HmacDrbgContextE EN4conf14BeaconNodeConfEN16network_metadata19Eth2NetworkMetadataE [2024-07-14 23:07:50] [ 25.490473] nimbus_beacon_node[1115]: _ZN6system18rawWriteStackTraceE3varI3seqIN6system15StackTraceEntryEEE(371) /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim [2024-07-14 23:07:50] [ 25.490597] nimbus_beacon_node[1115]: _ZN4init4initE3refIN7futures26FutureBasecolonObjectType_EE(754) _ZN4init4initE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-14 23:07:50] [ 25.490725] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(233) _ZN18nimbus_beacon_node12loadChainDagEN4conf14BeaconNodeConfEN7presets13RuntimeConfigE3refIN15beacon_chain_db29B eaconChainDBcolonObjectType_EEN11beacon_node8EventBusE3refIN17validator_monitor16ValidatorMonitorEE3OptI7MDigestI6staticI3intEEE [2024-07-14 23:07:50] [ 25.490829] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/blockchain_dag.nim(1104) _ZN4init4initE8typeDescI3refIN17block_pools_types27ChainDAGRefcolonObjectType_EEEN7presets13 RuntimeConfigE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EE3refIN17validator_monitor16ValidatorMonitorEE3setIN6extras10UpdateFlagEE6string4procIN5forks30ForkedTrustedSignedBeaconBlockEE4procIN17block_pools_types20HeadChangeInfoO bjectEE4procIN17block_pools_types15ReorgInfoObjectEE4procI3refIN17block_pools_types27ChainDAGRefcolonObjectType_EEN17block_pools_types22FinalizationInfoObjectEEN11vanity_logs10VanityLogsEN30block_pools_types_light_client21LightClientDataCo nfigE [2024-07-14 23:07:50] [ 25.490943] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/consensus_object_pools/blockchain_dag.nim(752) _ZN29getStateOnlyMutableValidators29getStateOnlyMutableValidatorsE9openArrayIN4base23Immutabl eValidatorData2EE3refIN7kvstore26KvStoreRefcolonObjectType_EE9openArrayI5uInt8E3varIN5deneb11BeaconStateEE4procIE [2024-07-14 23:07:50] [ 25.491059] nimbus_beacon_node[1115]: Q(1303) _ZN15beacon_chain_db8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EEN5forks13ConsensusForkE7MDigestI6staticI3intEE3varIN5forks23Forke dHashedBeaconStateEE4procIE [2024-07-14 23:07:50] [ 25.491138] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(1291) _ZN8getState8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EE7MDigestI6staticI3intEE3varIN5deneb11B eaconStateEE4procIE [2024-07-14 23:07:50] [ 25.491240] nimbus_beacon_node[1115]: _ZN17raiseResultDefect17raiseResultDefectE6string6string(1208) _ZN14blockchain_dag8getStateE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EEN7presets13RuntimeConfigE7MD igestI6staticI3intEE5SliceIN9constants4SlotEE3varIN5forks23ForkedHashedBeaconStateEE4procIE [2024-07-14 23:07:50] [ 25.491320] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(743) _ZN8getSZSSZ8getSZSSZE3refIN7kvstore26KvStoreRefcolonObjectType_EE9openArrayI5uInt8E3varIN25beacon_chain_db_immutable 37DenebBeaconStateNoImmutableValidatorsEE [2024-07-14 23:07:50] [ 25.491400] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(872) _ZN6expect6expectE6ResultI4bool6stringE6string.constprop.0 [2024-07-14 23:07:50] [ 25.491477] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(376) /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim [2024-07-14 23:07:50] [ 25.491558] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(329) /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim [2024-07-14 23:07:50] [ 25.491636] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntry EEE [2024-07-14 23:07:50] [ 25.491713] nimbus_beacon_node[1115]: [[reraised from: [2024-07-14 23:07:50] [ 25.491793] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests [2024-07-14 23:07:50] [ 25.491866] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE [2024-07-14 23:07:50] [ 25.491936] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE [2024-07-14 23:07:50] [ 25.492030] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2241) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE [2024-07-14 23:07:50] [ 25.492156] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(552) _ZN4init4initE8typeDescI3refIN11beacon_node26BeaconNodecolonObjectType_EEE3refIN12bearssl_rand15HmacDrbgContextE EN4conf14BeaconNodeConfEN16network_metadata19Eth2NetworkMetadataE [2024-07-14 23:07:50] [ 25.492261] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-14 23:07:50] [ 25.492330] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(896) _ZN4init4initE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-14 23:07:50] [ 25.492433] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntry EEE [2024-07-14 23:07:50] [ 25.492528] nimbus_beacon_node[1115]: ]] [2024-07-14 23:07:50] [ 25.492600] nimbus_beacon_node[1115]: [[reraised from: [2024-07-14 23:07:50] [ 25.492666] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests [2024-07-14 23:07:50] [ 25.492748] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE [2024-07-14 23:07:50] [ 25.492812] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE [2024-07-14 23:07:50] [ 25.492910] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2241) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE [2024-07-14 23:07:50] [ 25.493010] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(552) _ZN4init4initE8typeDescI3refIN11beacon_node26BeaconNodecolonObjectType_EEE3refIN12bearssl_rand15HmacDrbgContextEEN4conf14BeaconNodeConfEN16network_metadata19Eth2NetworkMetadataE [2024-07-14 23:07:50] [ 25.493074] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-14 23:07:50] [ 25.493132] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(896) _ZN4init4initE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-14 23:07:50] [ 25.493211] nimbus_beacon_node[1115]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE [2024-07-14 23:07:50] [ 25.493267] nimbus_beacon_node[1115]: ]] [2024-07-14 23:07:50] [ 25.493323] nimbus_beacon_node[1115]: Error: unhandled exception: working database (disk broken/full?): database disk image is malformed [ResultDefect] [2024-07-14 23:07:50] [ 25.494075] systemd[1]: nimbus_beacon_node.service: Main process exited, code=exited, status=1/FAILURE [2024-07-14 23:07:50] [ 25.494211] systemd[1]: nimbus_beacon_node.service: Failed with result 'exit-code'. [2024-07-14 23:07:50] [ 25.494368] systemd[1]: nimbus_beacon_node.service: Consumed 5.659s CPU time. [2024-07-14 23:07:50] [ 25.844828] systemd[1]: nimbus_beacon_node.service: Scheduled restart job, restart counter is at 1. [2024-07-14 23:07:50] [ 25.845029] systemd[1]: Stopped nimbus_beacon_node.service - Nimbus Beacon Node (Ethereum consensus client). ```

Screenshots If applicable, add screenshots to help explain your problem.

Additional context I'm not really sure if those two incidents are related, but since Nimbus was running flawlessly before, and it happened in similar time on two separate hosts (even in separate physical locations), I suspect they might be related. Few other hosts running nimbus 24.6.0 and 24.5.1 are not affected.

cheatfate commented 1 month ago

Sorry, there is no such information, so i will ask, do you have enough free space on the disk where database is stored? Could you please also check if disk used by nimbus-eth2 database is ok? Because as i understand first crash happened, when block being stored, and second crash happened when you tried to start beacon_node again.

marmarek commented 1 month ago

Sorry, there is no such information, so i will ask, do you have enough free space on the disk where database is stored?

Yes, there is more than enough space in both cases (over 200GB free on both hosts).

Could you please also check if disk used by nimbus-eth2 database is ok?

No disk/filesystem error as far I can see.

Because as i understand first crash happened, when block being stored, and second crash happened when you tried to start beacon_node again.

Yes, but note those are on two separate hosts - on one it failed spontaneously, and on another didn't started anymore after update (no issues before update). I suspect it might be related to something on the network at that time, but I'm not sure...

cheatfate commented 1 month ago

This error message is from SQLITE3 code nimbus-eth2 using, in first case it happens when write operation happened in second case it happened when database file is being opened.

marmarek commented 1 month ago

sqlite3 you say, so I did this (on the one that failed during write operation):

sqlite3 -cmd 'pragma integrity_check' shared_mainnet_0/db/nbc.sqlite3

and got:

*** in database main ***  
On tree page 10662511 cell 1: invalid page number 235267435
Page 2680842 is never used
Page 2680843 is never used                  
Page 2680844 is never used
Page 2680845 is never used
Page 2680846 is never used
Page 2680847 is never used
Page 2680848 is never used
Page 2680849 is never used
Page 2680850 is never used
Page 2680851 is never used
Page 2680852 is never used
Page 2680853 is never used
Page 2680854 is never used
Page 2680855 is never used
Page 2680856 is never used
Page 2680857 is never used
Page 2680858 is never used
Page 2680859 is never used
...

I'm not sure how helpful that is...

tersec commented 1 month ago

We've never seen this particular error, and it appears to be something happening in the SQLite library itself, given the

On tree page 10662511 cell 1: invalid page number 235267435

Nimbus does not use SQLite3 in a fine-grained enough way to seemingly trigger such an issue unless other random memory corruption or similar issues are happening.

It's worth checking, perhaps, if the nodes and hosts in question:

Should one be given to understand that

Platform details (OS, architecture): Linux amd64, Debian 12, but with vanilla kernel 6.6.31

it's otherwise all defaults, bare metal, ext4, default filesystem mount options?

marmarek commented 1 month ago

Platform details (OS, architecture): Linux amd64, Debian 12, but with vanilla kernel 6.6.31

it's otherwise all defaults, bare metal, ext4, default filesystem mount options?

It is a VM (on Xen), but otherwise plain ext4, and with nosuid,nodev,discard mount options (on this partition).

I don't see anything unusual in monitoring at that time (temperature, i/o rates, RAID state etc all at normal).

BTW, yesterday two more hosts behaved in an usual but different way - OOM killer killed nimbus process, after it quickly reached over 16GB (normally sits at around 4GB). Never happened before.

marmarek commented 1 month ago

In the meantime, the database crash happened two more times (on yet another hosts), but interestingly, after automatic service restart (via systemd) it continued normally.

Here is one of the crashes:

Details

``` [2024-07-24 00:32:06] [258380.167614] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests [2024-07-24 00:32:06] [258380.167916] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE [2024-07-24 00:32:06] [258380.167998] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE [2024-07-24 00:32:06] [258380.168094] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2254) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE [2024-07-24 00:32:06] [258380.168168] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2016) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-24 00:32:06] [258380.168239] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1963) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-24 00:32:06] [258380.168309] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE [2024-07-24 00:32:06] [258380.168369] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-24 00:32:06] [258380.168436] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/sync/sync_protocol.nim(344) _ZN30blobSidecarsByRangeUserHandler30blobSidecarsByRangeUserHandlerE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-24 00:32:06] [258380.168497] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/beacon_chain_db.nim(1068) _ZN15beacon_chain_db16getBlobSidecarSZE3refIN15beacon_chain_db29BeaconChainDBcolonObjectType_EE7MDigestI6staticI3intEE6uInt643varI3seqI5uInt8EE [2024-07-24 00:32:06] [258380.168564] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(872) _ZN6expect6expectE6ResultI4bool6stringE6string.constprop.0 [2024-07-24 00:32:06] [258380.168630] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-results/results.nim(376) _ZN17raiseResultDefect17raiseResultDefectE6string6string [2024-07-24 00:32:06] [258380.168700] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(329) _ZN6system18rawWriteStackTraceE3varI3seqIN6system15StackTraceEntryEEE [2024-07-24 00:32:06] [258380.168766] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE [2024-07-24 00:32:06] [258380.168835] nimbus_beacon_node[836]: [[reraised from: [2024-07-24 00:32:06] [258380.168910] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests [2024-07-24 00:32:06] [258380.168972] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE [2024-07-24 00:32:06] [258380.169125] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE [2024-07-24 00:32:06] [258380.169192] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2254) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgContextEE [2024-07-24 00:32:06] [258380.169254] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2016) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-24 00:32:06] [258380.169316] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1963) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-24 00:32:06] [258380.169378] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE [2024-07-24 00:32:06] [258380.169437] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-24 00:32:06] [258380.169496] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/sync/sync_protocol.nim(369) _ZN30blobSidecarsByRangeUserHandler30blobSidecarsByRangeUserHandlerE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-24 00:32:06] [258380.169572] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE [2024-07-24 00:32:06] [258380.169632] nimbus_beacon_node[836]: ]] [2024-07-24 00:32:06] [258380.169696] nimbus_beacon_node[836]: [[reraised from: [2024-07-24 00:32:06] [258380.169755] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests [2024-07-24 00:32:06] [258380.169815] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2430) _ZN18nimbus_beacon_node4mainE [2024-07-24 00:32:06] [258380.169874] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2353) _ZN18nimbus_beacon_node16handleStartUpCmdE3varIN4conf14BeaconNodeConfEE [2024-07-24 00:32:06] [258380.169933] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2254) _ZN18nimbus_beacon_node15doRunBeaconNodeE3varIN4conf14BeaconNodeConfEE3refIN12bearssl_rand15HmacDrbgontextEE [2024-07-24 00:32:06] [258380.169992] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2016) _ZN18nimbus_beacon_node5startE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-24 00:32:06] [258380.170089] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1963) _ZN18nimbus_beacon_node3runE3refIN11beacon_node26BeaconNodecolonObjectType_EE [2024-07-24 00:32:06] [258380.367968] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncengine.nim(150) _ZN11asyncengine4pollE [2024-07-24 00:32:06] [258380.368090] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nim-chronos/chronos/internal/asyncfutures.nim(371) _ZN12asyncfutures14futureContinueE3refIN7futures26FutureBasecolonObjectType_EE [2024-07-24 00:32:06] [258380.368152] nimbus_beacon_node[836]: /home/user/nimbus-eth2/beacon_chain/sync/sync_protocol.nim(369) _ZN30blobSidecarsByRangeUserHandler30blobSidecarsByRangeUserHandlerE3refIN7futures26FutureBasecolonObjectTpe_EE [2024-07-24 00:32:06] [258380.368216] nimbus_beacon_node[836]: /home/user/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/stacktraces.nim(62) _ZN11stacktraces30auxWriteStackTraceWithOverrideE3varI3seqIN6system15StackTraceEntryEEE [2024-07-24 00:32:06] [258380.368319] nimbus_beacon_node[836]: ]] [2024-07-24 00:32:06] [258380.368380] nimbus_beacon_node[836]: Error: unhandled exception: working database (disk broken/full?): database disk image is malformed [ResultDefect] ``` and the startup: ``` [2024-07-24 00:32:07] [258381.230402] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:08.211+00:00 Launching beacon node topics="beacnde" version=v24.6.0-7d0078-stateofus ... (redacted) [2024-07-24 00:32:07] [258381.350230] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:08.211+00:00 Starting metrics HTTP server topics="beacnde" url=http://127.0.0.1:8008/metrics [2024-07-24 00:32:07] [258381.350454] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:08.285+00:00 Threadpool started topics="beacnde" numThreads=16 [2024-07-24 00:32:17] [258390.890452] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:17.871+00:00 Loading block DAG from database topics="beacnde" path=/home/nimbus/shared_mainnet_0/db [2024-07-24 00:32:19] [258393.053200] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:20.034+00:00 Block DAG initialized head=87fb0013:9579758 finalizedHead=b1b64ef8:9579680 tail=8584f5a5:8522751 backfill="(0, \"00000000\")" loadDur=7ms244us7ns summariesDur=1s116ms129us938ns finalizedDur=1s39ms290us768ns frontfillDur=30ns keysDur=52us50ns [2024-07-24 00:32:20] [258394.028505] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:21.009+00:00 Starting REST HTTP server topics="beacnde" url=http://127.0.0.1:5052 [2024-07-24 00:32:20] [258394.028625] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.010+00:00 Generating new networking key topics="networking" network_public_key=... [2024-07-24 00:32:20] [258394.030049] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.011+00:00 Discovery ENR initialized topics="eth p2p discv5" enrAutoUpdate=true seqNum=1 ... [2024-07-24 00:32:20] [258394.030180] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.011+00:00 Loading slashing protection database (v2) topics="beacnde" path=/home/nimbus/shared_mainnet_0/validators [2024-07-24 00:32:20] [258394.052039] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.033+00:00 Using external payload builder topics="beacnde" payloadBuilderUrl=http://localhost:18550 [2024-07-24 00:32:21] [258394.528259] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:21.509+00:00 Initializing fork choice topics="beacnde" unfinalized_blocks=78 [2024-07-24 00:32:23] [258396.741980] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:23.722+00:00 State replayed topics="chaindag" blocks=32 slots=32 current=481ae9a5:9579679@9579680 ancestor=481ae9a5:9579679@9579680 target=1c8cf120:9579711@9579712 ancestorStateRoot=a0bc9ae4 targetStateRoot=6e2f3a04 found=true assignDur=125us121ns replayDur=1s432ms389us425ns [2024-07-24 00:32:24] [258397.821519] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:24.802+00:00 Fork choice initialized topics="beacnde" justified=299366:654331c4 finalized=299365:b1b64ef8 [2024-07-24 00:32:24] [258397.829446] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:24.811+00:00 Loading validators topics="beacval" validatorsDir=/home/nimbus/shared_mainnet_0/validators keystore_cache_available=true [2024-07-24 00:32:25] [258398.795235] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.776+00:00 State replayed topics="chaindag" blocks=1 slots=0 current=87fb0013:9579758 ancestor=7e28565d:9576447@9576448 target=26d03af3:9576448 ancestorStateRoot=2e072c04 targetStateRoot=73fa4983 found=false assignDur=318ms217us100ns replayDur=556ms908us272ns [2024-07-24 00:32:25] [258398.795761] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:25.777+00:00 Starting beacon node topics="beacnde" version=v24.6.0-7d0078-stateofus nimVersion=1.6.20 enr=... [2024-07-24 00:32:25] [258398.798795] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.780+00:00 Listening to incoming network requests topics="beacnde" [2024-07-24 00:32:25] [258398.798883] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.780+00:00 Starting discovery node topics="eth p2p discv5" ... [2024-07-24 00:32:25] [258398.799648] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.781+00:00 Starting execution layer deposit syncing topics="elman" contract=0x00000000219ab540356cbb839cbe05303d7705fa [2024-07-24 00:32:25] [258398.799838] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:25.781+00:00 Connection attempt started topics="elman" [2024-07-24 00:32:25] [258398.801022] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:25.782+00:00 REST service started address=127.0.0.1:5052 [2024-07-24 00:32:25] [258398.801114] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:25.782+00:00 Starting light client topics="lightcl" trusted_block_root=none(Eth2Digest) [2024-07-24 00:32:25] [258398.801190] nimbus_beacon_node[382823]: NTC 2024-07-24 00:32:25.782+00:00 Setting up doppelganger detection topics="gossip_eth2" epoch=299367 broadcast_epoch=299368 [2024-07-24 00:32:25] [258399.055127] nimbus_beacon_node[382823]: INF 2024-07-24 00:32:26.034+00:00 Scheduling first slot action topics="beacnde" startTime=190w12h32m2s782ms634us512ns nextSlot=9579761 timeToNextSlot=9s217ms365us488ns ```

The database itself has about 158GB, which I assume is expected size, right?