prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.46k stars 985 forks source link

Batch blocks sync leads to tons of atts invalid sigs error #6593

Closed terencechain closed 4 years ago

terencechain commented 4 years ago

Latest commit: 52e9155df33e180717192cdd8c7b59d173dd4024

When using initial sync batch verify feature, a common theme i realized (8 out of 10 times) is the initial sync would always exit early due to the following error:

[2020-07-13 18:34:38] ERROR initial-sync: Failed to process block, exiting init sync error=head at slot 214816 with weight 17765 is not eligible, FinalizedEpoch 6711 != 6712, JustifiedEpoch 6712 != 6713
could not update head

Which eventually leads to a lot of sigs did not verify errors:

WARN blockchain: Could not receive attestation in chain service aggregationCount=8 beaconBlockRoot=0xd36efe9d934d committeeIndex=0 error=signature did not verify

The sigs did not verify errors happen on almost all the attestations every slot after initial syncing. This lasts around 10 minutes

fabdarice commented 4 years ago

Are you working on this @terencechain ? I saw a draft PR from you. If not, willing to tackle this issue.

terencechain commented 4 years ago

Are you working on this @terencechain ? I saw a draft PR from you. If not, willing to tackle this issue.

I think we still need more investigating and testing to conclude whether this is a bug and where the bug is. I mainly opened this issue track to the investigation process. Feel free to help us test this issue to see if it's reproducible on your end. In term of code fixes, it's still too early to tell what that looks like

nisdas commented 4 years ago

@terencechain I have tried, but I cant reproduce it. Can you outline how you run it and with what flags ?

terencechain commented 4 years ago

@terencechain I have tried, but I cant reproduce it. Can you outline how you run it and with what flags ?

It just was --dev and syncing from genesis to head in altona. I haven't tried on the latest commit. Maybe it's fixed

terencechain commented 4 years ago

No longer reproducible. The last few batch block improvements PR must have fixed this

sieg-i commented 4 years ago

I have the same error for two days now with the batch sync, my beacon chain is not catching up anymore due to this issue.

Prysm/v1.0.0-alpha.20/cbc27e0f2e2259504a77e14727b7465d5a7f7341.

When checking the logs when it started I saw some messages like "Roughtime reports your clock is off by more than 2 seconds", don't know if it's related or not, in any case I enforced NTP sync afterwards.

The grafana graph is also very strange from the moment when the issue started happening, there was something going wild on with the process it seems...restarted the process several time since then, but never could catch up...

[2020-08-16 00:04:31] INFO initial-sync: Processing block 0x26b8252b... 73312/81921 - estimated time remaining 1h35m39s blocksPerSecond=1.5 peers=19 [2020-08-16 00:04:32] ERROR initial-sync: Failed to process block, exiting init sync error=head at slot 73279 with weight 62003 is not eligible, finalizedEpoch 2287 != 2289, justifiedEpoch 2288 != 2290 could not update head github.com/prysmaticlabs/prysm/beacon-chain/blockchain.(Service).ReceiveBlock beacon-chain/blockchain/receive_block.go:43 github.com/prysmaticlabs/prysm/beacon-chain/sync/initial-sync.(Service).processBlock beacon-chain/sync/initial-sync/round_robin.go:215 github.com/prysmaticlabs/prysm/beacon-chain/sync/initial-sync.(Service).roundRobinSync beacon-chain/sync/initial-sync/round_robin.go:129 github.com/prysmaticlabs/prysm/beacon-chain/sync/initial-sync.(Service).Start beacon-chain/sync/initial-sync/service.go:157 runtime.goexit src/runtime/asm_arm64.s:1148 [2020-08-16 00:04:32] INFO initial-sync: Synced up to slot 73311

image