prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.47k stars 1.01k forks source link

eth1data: Prysm does not vote with the majority if the majority is voting on an older block #7856

Closed prestonvanloon closed 3 years ago

prestonvanloon commented 4 years ago

🐞 Bug Report

Description

At the moment, eth1data consensus is not being reached.

image

Nimbus is voting for a very old block, 5 days old. https://goerli.etherscan.io/block/0x0566a632683a61180a735bf72f7daea59deeb9781d4a43be41a86d484df6287f

Lighthouse and Teku are voting on a block that is 30 minutes older than what Prysm thinks is the start of the eth1voting period. https://goerli.etherscan.io/block/0x3da88bec112d32d5e81741dfd395df14381346ebf48b91b1bfa111fdb7a69b5f (Nov-18-2020 10:18:00 AM +UTC)

Prysm thinks the start of this eth1data voting period is

new Date((1605700807 + 2048*12-2048*14)*1000)
// Wed Nov 18 2020 02:51:51 GMT-0800 (Pacific Standard Time)

Genesis time is 1605700807 Slot is 2048 Eth1FollowDistance is 2048

So the current start slot time minus the eth1followdistance comes out to be Wed Nov 18 2020 02:51:51 GMT-0800 (Pacific Standard Time).

So Prysm considers this block to be the highest block in the previous eth1data voting period. https://goerli.etherscan.io/block/0x227ed776f1e4a705ec8fae4ac1342d316be3f4cb9bf5248a9848bf344776f9c5 (Nov-18-2020 10:51:45 AM +UTC)

The reason I am filing a bug in Prysm is that I think that Prysm's voting with the majority mechanism should consider votes from the previous voting period to be acceptable. Otherwise we could see these types of issues or off by one issues when multiple blocks are produced in quick succession on eth1.

Another thing to consider, if client teams do not have the same eth1 block time of 14s then it will be even more difficult to reach eth1data consensus at mainnet. Some additional context: https://github.com/ethereum/eth2.0-specs/issues/2132 Prysm MUST conform with other clients in this regard for mainnet configuration. As such, I am marking this as a P0.

Has this worked before in a previous version?

Maybe when Prysm was the majority then Lighthouse and Teku would favor Prysm's block vote because it is still in range of their interpreted eth1data voting period.

🔬 Minimal Reproduction

This is occuring in Pyrmont testnet.

🔥 Error

N/A

🌍 Your Environment

Operating System:

N/A

What version of Prysm are you running? (Which release)

v1.0.0-beta.3

Anything else relevant (validator index / public key)?

protolambda commented 4 years ago

Small (bug related?) issue in your description: The genesis time is not 1605700800, that is the minimum genesis time. The actual genesis time (known as state.genesis_time) is 1605700807 # (Wednesday, November 18, 2020 12:00:07 PM) This is due to it being triggered by the first eth1 block after the minimum genesis time, minus eth1 delay.

prestonvanloon commented 4 years ago

@protolambda Thanks for that check. I'll update description, i was off by a few seconds. It doesn't explain the 30mins discrepancy though!

protolambda commented 4 years ago

Also note that the genesis delay was set to 5 days. This may relate to the nimbus bug!

Edit: exactly 5 days (genesis delay) and ~9 hours (time since genesis now). Definitely a bug. I wonder if the nimbus bug gets resolved without action after first voting period. Could be that the eth1 vote data was prepared too early.

prestonvanloon commented 4 years ago

Also looks like I have the assumption wrong, we want the highest block up to the voting round start time. Editing that...

prestonvanloon commented 4 years ago

Thinking about this more, Prysm should be voting with LH+Teku. Assigning @rkapka to look at that. It seems our voting with the majority is not working properly.

Assigning to @nisdas to look into our deviation of mainnet spec configuration of 14s for eth1 block times. https://github.com/prysmaticlabs/prysm/blob/095c4d5dd51a11c8d63c58e01a4a4774a0b5ad27/shared/params/mainnet_config.go#L85

rkapka commented 4 years ago

The reason I am filing a bug in Prysm is that I think that Prysm's voting with the majority mechanism should consider votes from the previous voting period to be acceptable.

This would be against the specification: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md#get_eth1_data

Specifically take a look at the definition of is_candidate_block and

votes_to_consider = [
    get_eth1_data(block) for block in eth1_chain
    if (
        is_candidate_block(block, period_start)
        # Ensure cannot move back to earlier deposit contract states
        and get_eth1_data(block).deposit_count >= state.eth1_data.deposit_count
    )
]
# Valid votes already cast during this period
valid_votes = [vote for vote in state.eth1_data_votes if vote in votes_to_consider]

Lighthouse and Teku are voting on a block that is 30 minutes older than what Prysm thinks is the start of the eth1voting period.

It seems that either both clients don't follow the spec or Prysm's voting period start time is incorrect.

prestonvanloon commented 4 years ago

Lighthouse and Teku are voting on a block that is 30 minutes older than what Prysm thinks is the start of the eth1voting period.

This actually makes sense. The voting period starts and we are supposed to consider the highest eth1 block before the start of the voting period.

prestonvanloon commented 4 years ago

I restarted all prylabs nodes and they are suddenly voting for the majority block.

I could not reproduce this locally either. I am thinking there is some caching issue of some kind, but overall the majority vote determining algorithm is correct. Lowering to P1, but still with some urgency before v1 mainnet.

prestonvanloon commented 3 years ago

This is fixed for pyrmont by #7861.

Leaving open until we fix the mainnet spec difference.