prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.47k stars 1.01k forks source link

Validator client may have wrong genesis time or bad roughtime offset #6544

Closed prestonvanloon closed 4 years ago

prestonvanloon commented 4 years ago

🐞 Bug Report

Description

In the Onyx testnet, we occasionally see objects produced from "the future" indicating that the client may have the wrong genesis time. There are some scenarios where only the validator has the incorrect genesis time and their beacon node starts rejecting all of their requests.

Has this worked before in a previous version?

Not sure when this issue started.

🔬 Minimal Reproduction

Not sure about reproduction steps.

🔥 Error

[2020-07-10 14:54:15] ERROR validator: Could not request attestation to sign at slot error=rpc error: code = InvalidArgument desc = invalid request: attestation slot 191284 not within attestation propagation range of 190052 to 190084 (current slot) pubKey=0xb4a6f246e7a2 slot=191284

🌍 Your Environment

Operating System:

N/A seems to affect all.

What version of Prysm are you running? (Which release)

Alpha.14

Anything else relevant (validator index / public key)?

prestonvanloon commented 4 years ago

User reported this happening in commit 8da02467024b89b5e86026027281d9dc51ebc0f9.

prestonvanloon commented 4 years ago

Note: this could/should be resolved with hard coded genesis times. Filed #6545 for that feature request. That would be recommended for mainnet release.

prestonvanloon commented 4 years ago

A user reported this issue, but that the issue resolved itself after 1 hour. 1 hour is the roughtime update interval.

https://discordapp.com/channels/476244492043812875/476588476393848832/731303268134813716

nisdas commented 4 years ago

Just to give more clarity on this users have reported seeing this error in the beacon node WARN powchain: eth1 client is not syncing at the same time as their validator has difficulties creating attestations.

https://github.com/prysmaticlabs/prysm/blob/master/beacon-chain/powchain/service.go#L574

    // use a 5 minutes timeout for block time, because the max mining time is 278 sec (block 7208027)
    // (analyzed the time of the block from 2018-09-01 to 2019-02-13)
    fiveMinutesTimeout := roughtime.Now().Add(-5 * time.Minute)
    // check that web3 client is syncing
    if time.Unix(int64(s.latestEth1Data.BlockTime), 0).Before(fiveMinutesTimeout) {
        log.Warn("eth1 client is not syncing")
    }

This would point there being an issue with roughtime, also roughtime recalibration happens every hour, there is the chance that we did receive an incorrect offset at that recalibration and it messed up our beacon node's clock. At which point attestations start failing due to the incorrect offset.

prestonvanloon commented 4 years ago

I think we have mitigated this issue and the relevant PRs have been released in alpha.15.

Closing this until we see evidence of this issue again.