status-im / nimbus-eth2

Nim implementation of the Ethereum Beacon Chain
https://nimbus.guide
Other
543 stars 233 forks source link

Nimbus crashing on slot end #1783

Closed acolytec3 closed 4 years ago

acolytec3 commented 4 years ago

Describe the bug Running make spadina on the current master branch, I am seeing the node run for a while and then terminate with the below errors produced by Nimbus and then the node crashes:

Error: unhandled exception: /home/ubuntu/nim-beacon-chain/beacon_chain/spec/network.nim(147, 12)nextEpochSubnets.len > 0 and nextEpochSubnets.len <= attachedValidators.len[AssertionError]

The two times I have seen it so far have both occurred after a Slot end message appears in the logs.

To Reproduce Steps to reproduce the behavior:

  1. Platform details (OS, architecture): Raspberry Pi 4 - Ubuntu 20.04 Server (64-bit)
  2. Branch/commit used: Current master - commit 78ceeed8042d6b70d9c1aae09688d43ff79db6e2
  3. Commands being executed: make spadina or make spadina-vc
  4. Relevant log lines:
    DBG 2020-09-30 14:27:33.002+00:00 No head block update                       topics="chaindag" tid=17526 file=chain_dag.nim:710 newHead=4857c47b:7936
    DBG 2020-09-30 14:27:33.002+00:00 UpdateStateData miss                       topics="chaindag" tid=17526 file=chain_dag.nim:629 bs=4d6043d9:7935 stateBlock=4d6043d9:7935 stateSlot=7936
    DBG 2020-09-30 14:27:33.315+00:00 State reloaded from database               topics="chaindag" tid=17526 file=chain_dag.nim:679 blocks=22 slots=31 stateRoot=a6476f89 stateSlot=7935 startRoot=beec27c1 startSlot=7904 blck=4d6043d9:7935
    INF 2020-09-30 14:27:33.316+00:00 Slot end                                   topics="beacnde" tid=17526 file=beacon_node.nim:548 slot=7936 nextSlot=7937 head=4857c47b:7936 headEpoch=248 finalizedHead=310de05a:7872 finalizedEpoch=246
    peers: 74 ❯ finalized: 310de05a:246 ❯ head: 4857c47b:248:0 ❯ time: 248:0 (7936) ❯ sync: synced                                                                                                                                                                             :  libbacktrace error: no debugging symbols available. Compile with '--debugger:native'.
    Traceback (most recent call last, using override)
    Error: unhandled exception: /home/ubuntu/nim-beacon-chain/beacon_chain/spec/network.nim(147, 12) `nextEpochSubnets.len > 0 and nextEpochSubnets.len <= attachedValidators.len`  [AssertionError]
acolytec3 commented 4 years ago

An update here, the error hasn't reoccurred in the past few hours and it seems to be stable. Will report if I see it again.

arnetheduck commented 4 years ago

@kdeme did you have a log file maybe of the same crash? preferably with the attestation received debug log?

kdeme commented 4 years ago

@arnetheduck Yes:

DBG 2020-09-28 10:12:15.957+02:00 Attestation received                       tid=3591 file=eth2_processor.nim:304 delay=7s957ms992us781ns wallSlot=394560 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000, data: (slot: 394560, index: 14, beacon_block_root: \"2f2f1559\", source: (epoch: 12328, root: \"491b3bb8\"), target: (epoch: 12330, root: \"2f2f1559\")), signature: \"af56065e\")" committeeIndex=14
INF 2020-09-28 10:12:15.961+02:00 Attestation resolved                       topics="attpool" tid=3591 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000, data: (slot: 394560, index: 14, beacon_block_root: \"2f2f1559\", source: (epoch: 12328, root: \"491b3bb8\"), target: (epoch: 12330, root: \"2f2f1559\")), signature: \"af56065e\")" validations=10
DBG 2020-09-28 10:12:16.004+02:00 Fork choice requested                      topics="fork_choice" tid=3591 file=fork_choice.nim:346 justified_epoch=12328 justified_root=491b3bb8 finalized_epoch=12327 fork_choice_head=2f2f1559
DBG 2020-09-28 10:12:16.004+02:00 No head block update                       topics="chaindag" tid=3591 file=chain_dag.nim:710 newHead=2f2f1559:394557
DBG 2020-09-28 10:12:16.005+02:00 UpdateStateData miss                       topics="chaindag" tid=3591 file=chain_dag.nim:629 bs=2f2f1559:394557@394559 stateBlock=04f29100:394551 stateSlot=394560
DBG 2020-09-28 10:12:16.179+02:00 State reloaded from database               topics="chaindag" tid=3591 file=chain_dag.nim:679 blocks=22 slots=31 stateRoot=eadcf6a2 stateSlot=394559 startRoot=dad5806e startSlot=394528 blck=2f2f1559:394557@394559
INF 2020-09-28 10:12:16.180+02:00 Slot end                                   topics="beacnde" tid=3591 file=beacon_node.nim:553 slot=394560 nextSlot=394561 head=2f2f1559:394557 headEpoch=12329 finalizedHead=5bf8d980:394464 finalizedEpoch=12327
 peers: 79 ❯ finalized: 5bf8d980:12327 ❯ head: 2f2f1559:12329:29 ❯ time: 12330:0 (394560) ❯ sync: UUUUUUUUUUUUUUUUUUUU:20:0.0000 (394552)                                                                                                                                          :  libbacktrace error: no debugging symbols available. Compile with '--debugger:native'.
Traceback (most recent call last, using override)
Error: unhandled exception: /home/deme/repos/nim-beacon-chain/beacon_chain/spec/network.nim(147, 12) `nextEpochSubnets.len > 0 and nextEpochSubnets.len <= attachedValidators.len`  [AssertionError]

Here are the last 100 lines: crash_100_lines.log

Can zip you the full log if needed.

arnetheduck commented 4 years ago

oops, was thinking about the other crash, this one is unrelated to attestations

yorickdowne commented 4 years ago

I am seeing this as well, here's the log. Debug wasn't on, I can enable that and see how it behaves.

beacon_1            | INF 2020-10-01 12:47:27.987+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000, data: (slot: 417536, index: 9, beacon_block_root: \"c67e281f\", source: (epoch: 13047, root: \"5717afee\"), target: (epoch: 13048, root: \"c67e281f\")), signature: \"8dafe378\")" validations=5
beacon_1            | libbacktrace error: no debugging symbols available. Compile with '--debugger:native'.
beacon_1            | Traceback (most recent call last, using override)
beacon_1            | Error: unhandled exception: /usr/src/nim-beacon-chain/beacon_chain/spec/network.nim(147, 12) `nextEpochSubnets.len > 0 and nextEpochSubnets.len <= attachedValidators.len`  [AssertionError]
beacon_1            | INF 2020-10-01 12:47:27.988+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000, data: (slot: 417536, index: 10, beacon_block_root: \"c67e281f\", source: (epoch: 13047, root: \"5717afee\"), target: (epoch: 13048, root: \"c67e281f\")), signature: \"850614be\")" validations=2
yorickdowne commented 4 years ago

Here it is with debug, I don't really see additional information. Good news is I can reproduce this.

beacon_1            | INF 2020-10-01 13:06:39.992+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, data: (slot: 417632, index: 0, beacon_block_root: \"f6ae8636\", source: (epoch: 13049, root: \"9ef71bc7\"), target: (epoch: 13051, root: \"f6ae8636\")), signature: \"96b866c2\")" validations=56
beacon_1            | INF 2020-10-01 13:06:39.998+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000, data: (slot: 417632, index: 8, beacon_block_root: \"f6ae8636\", source: (epoch: 13049, root: \"9ef71bc7\"), target: (epoch: 13051, root: \"f6ae8636\")), signature: \"b7dd03cd\")" validations=69
beacon_1            | INF 2020-10-01 13:06:39.999+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000, data: (slot: 417632, index: 2, beacon_block_root: \"f6ae8636\", source: (epoch: 13049, root: \"9ef71bc7\"), target: (epoch: 13051, root: \"f6ae8636\")), signature: \"8fab40c1\")" validations=70
beacon_1            | INF 2020-10-01 13:06:39.999+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b0000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, data: (slot: 417632, index: 6, beacon_block_root: \"f6ae8636\", source: (epoch: 13049, root: \"9ef71bc7\"), target: (epoch: 13051, root: \"f6ae8636\")), signature: \"87618bad\")" validations=59
beacon_1            | INF 2020-10-01 13:06:40.008+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000, data: (slot: 417632, index: 3, beacon_block_root: \"f6ae8636\", source: (epoch: 13049, root: \"9ef71bc7\"), target: (epoch: 13051, root: \"f6ae8636\")), signature: \"b1a08510\")" validations=72
beacon_1            | INF 2020-10-01 13:06:40.008+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000, data: (slot: 417632, index: 7, beacon_block_root: \"f6ae8636\", source: (epoch: 13049, root: \"9ef71bc7\"), target: (epoch: 13051, root: \"f6ae8636\")), signature: \"b731117d\")" validations=69
beacon_1            | INF 2020-10-01 13:06:40.009+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000, data: (slot: 417632, index: 11, beacon_block_root: \"f6ae8636\", source: (epoch: 13049, root: \"9ef71bc7\"), target: (epoch: 13051, root: \"f6ae8636\")), signature: \"8f118810\")" validations=69
beacon_1            | INF 2020-10-01 13:06:40.009+00:00 Attestation resolved                       topics="attpool" tid=1 file=attestation_pool.nim:187 attestation="(aggregation_bits: 0b0010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, data: (slot: 417632, index: 6, beacon_block_root: \"f6ae8636\", source: (epoch: 13049, root: \"9ef71bc7\"), target: (epoch: 13051, root: \"f6ae8636\")), signature: \"b5b580da\")" validations=60
beacon_1            | INF 2020-10-01 13:06:40.494+00:00 Slot end                                   topics="beacnde" tid=1 file=beacon_node.nim:548 slot=417632 nextSlot=417633 head=f6ae8636:417630 headEpoch=13050 finalizedHead=c67e281f:417536 finalizedEpoch=13048
beacon_1            | libbacktrace error: no debugging symbols available. Compile with '--debugger:native'.
beacon_1            | Traceback (most recent call last, using override)
beacon_1            | Error: unhandled exception: /usr/src/nim-beacon-chain/beacon_chain/spec/network.nim(147, 12) `nextEpochSubnets.len > 0 and nextEpochSubnets.len <= attachedValidators.len`  [AssertionError]
eth2-nimbus_beacon_1 exited with code 1
tersec commented 4 years ago

This should be fixed in devel. Re-open if it's not.