status-im / nimbus-eth2

Nim implementation of the Ethereum Beacon Chain
https://nimbus.guide
Other
516 stars 222 forks source link

keymanager API very slow while Nimbus is syncing #5289

Open yorickdowne opened 1 year ago

yorickdowne commented 1 year ago

Describe the bug

When Nimbus is syncing, the keymanager remote key function - POST to eth/v1/remotekeys - is extremely slow, and interferes with syncing.

The regular keys function was not tested but might behave the same.

To Reproduce

Additional context

Sync may show missed heartbeat messages while keymanager API is in use

eth-docker-devel-consensus-1  | INF 2023-08-13 09:22:31.189+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=26s279ms985us157ns hinterval=700ms
eth-docker-devel-consensus-1  | INF 2023-08-13 09:22:31.189+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat="Gossipsub scoring" delay=13s818ms51us270ns hinterval=12s

On the key import side the remote key function takes 10 seconds or more to complete, or may even time out with a 60s timeout set in curl

errantelectron commented 10 months ago

Encountering the same issue - this seems like it interferes with all functions of Nimbus (including preventing effective peering during the sync), and makes Nimbus perform extremely poorly with Rocketpool.

It will eventually manage a sync, but it initially gets itself stuck at just 1 peer for hours with the logs showing a continual loop of:

rocketpool_eth2  | INF 2023-11-09 09:15:53.979+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=13s356ms102us506ns hinterval=700ms
rocketpool_eth2  | INF 2023-11-09 09:15:53.988+00:00 Slot end                                   topics="beacnde" slot=7724776 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 syncCommitteeDuties=none head=28ab91d3:7724672
rocketpool_eth2  | INF 2023-11-09 09:15:53.991+00:00 Slot start                                 topics="beacnde" slot=7724777 epoch=241399 sync="--h--m (0.00%) 0.0000slots/s (wPwwwwwwwD:7724671)" peers=2 head=28ab91d3:7724672 finalized=241394:17613184 delay=6s991ms869us447ns
rocketpool_eth2  | INF 2023-11-09 09:16:17.531+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=22s151ms558us224ns hinterval=700ms
rocketpool_eth2  | INF 2023-11-09 09:16:17.543+00:00 Slot end                                   topics="beacnde" slot=7724777 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 syncCommitteeDuties=none head=28ab91d3:7724672
rocketpool_eth2  | NOT 2023-11-09 09:16:17.560+00:00 Missed expected slot start, catching up    topics="beacnde" delay=18s560ms56us427ns curSlot=7724777 nextSlot=7724777
rocketpool_eth2  | INF 2023-11-09 09:16:17.560+00:00 Slot start                                 topics="beacnde" slot=7724779 epoch=241399 sync="--h--m (0.00%) 0.0000slots/s (wPwwwwwwww:7724671)" peers=1 head=28ab91d3:7724672 finalized=241394:17613184 delay=18s560ms56us427ns
rocketpool_eth2  | INF 2023-11-09 09:16:41.799+00:00 Missed multiple heartbeats                 topics="libp2p gossipsub" heartbeat=GossipSub delay=22s868ms211us290ns hinterval=700ms
rocketpool_eth2  | INF 2023-11-09 09:16:41.811+00:00 Slot end                                   topics="beacnde" slot=7724779 nextActionWait=n/a nextAttestationSlot=-1 nextProposalSlot=-1 syncCommitteeDuties=none head=28ab91d3:7724672
rocketpool_eth2  | NOT 2023-11-09 09:16:41.819+00:00 Missed expected slot start, catching up    topics="beacnde" delay=18s819ms463us597ns curSlot=7724779 nextSlot=7724779