prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.47k stars 1.01k forks source link

No block received in the gossip network #5476

Closed terencechain closed 4 years ago

terencechain commented 4 years ago

In my interop setup, the peering node does not receive beacon block over the gossip network. It's only receiving beacon block over RPC network due to missing attestation block. Other people don't seem to be able to reproduce this issue so I think this is minor but worth investigating. (See screen shot for proof)

Also logged incoming block here and verified there's nothing: https://github.com/prysmaticlabs/prysm/blob/master/beacon-chain/sync/validate_beacon_blocks.go#L49

Setup instructions: https://gist.github.com/terencechain/7df28247d20638d0ae8e97337bbbcda3

Screen Shot 2020-04-17 at 9 13 08 AM
terencechain commented 4 years ago

Works on the second retry (ie restarted peering node)

Screen Shot 2020-04-17 at 9 26 08 AM
terencechain commented 4 years ago

Confirmed it's not returning false because it's still initial syncing: https://github.com/prysmaticlabs/prysm/blob/master/beacon-chain/sync/validate_beacon_blocks.go#L29

The issue is before pipeline validateBeaconBlockPubSub

terencechain commented 4 years ago

Confirmed via metrics, there's no blocking coming in: p2p_message_received_total{topic="/eth2/f071c66c/beacon_block/ssz_snappy"} 1

prestonvanloon commented 4 years ago

Sounds like we are not setting chainStarted correctly at startup.

> github.com/prysmaticlabs/prysm/beacon-chain/sync.(*Service).registerSubscribers() beacon-chain/sync/subscriber.go:50 (PC: 0x16cf785)
    45: // Register PubSub subscribers
    46: func (r *Service) registerSubscribers() {
    47:         // Wait until chain start.
    48:         stateChannel := make(chan *feed.Event, 1)
    49:         stateSub := r.stateNotifier.StateFeed().Subscribe(stateChannel)
=>  50:         defer stateSub.Unsubscribe()
    51:         for r.chainStarted == false {
    52:                 select {
    53:                 case event := <-stateChannel:
    54:                         if event.Type == statefeed.Initialized {
    55:                                 data, ok := event.Data.(*statefeed.InitializedData)
(dlv) p r.chainStarted
false
(dlv)

Shouldn't this be true for a node that was recently synced and restarted?

prestonvanloon commented 4 years ago

It could be a network partition. We are seeing that we have peers for that topic, but no messages coming in. Looking at the code, we don't see any reason that a block coming in would fail validation, yet we simply do not see blocks arrive. Perhaps we have peers that are not sending blocks at all?

nisdas commented 4 years ago

Closing this as its resolved by #5500