prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.45k stars 986 forks source link

When libp2p fails, node never re-attempts to get connection #2674

Closed SjonHortensius closed 4 years ago

SjonHortensius commented 5 years ago

Not sure what caused this - but it might be a temporary network glitch:

[2019-05-22 10:57:10]  INFO initial-sync: Synced!
[2019-05-22 10:57:10]  INFO regular-sync: Listening for regular sync messages from peers
[2019-05-22 15:30:39] ERROR p2p: Failed to reconnect to peer failed to dial : all dials failed
  * [/ip4/23.202.xxx/tcp/4000] dial tcp4 23.202.xxx:4000: connect: connection refused
  * [/ip4/23.195.xxx/tcp/4000] dial tcp4 23.195.xxx:4000: connect: connection refused
  * [/ip4/23.217.xxx/tcp/4000] dial tcp4 23.217.xxx:4000: connect: connection refused
  * [/ip4/127.0.0.1/tcp/4000] failed to negotiate security protocol: message did not have trailing newline
  * [/ip4/23.217.xxx/tcp/4000] dial tcp4 23.217.xxx:4000: connect: connection refused
  * [/ip4/10.52.xxx/tcp/4000] dial tcp4 10.52.xxx:4000: connect: no route to host
  * [/ip4/35.224.xxx/tcp/30001] failed to negotiate security protocol: EOF
  * [/ip4/10.52.xxx/tcp/4000] dial tcp4 10.52.xxx:4000: connect: no route to host
  * [/ip4/10.55.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->10.55.xxx:4000: i/o timeout
  * [/ip4/92.242.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->92.242.xxx:4000: i/o timeout
  * [/ip4/10.52.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->10.52.xxx:4000: i/o timeout
  * [/ip4/10.52.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->10.52.xxx:4000: i/o timeout
  * [/ip4/18.211.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->18.211.xxx:4000: i/o timeout
[2019-05-22 15:30:52] ERROR p2p: Failed to reconnect to peer failed to dial : all dials failed
  * [/ip4/23.202.xxx/tcp/4000] dial tcp4 23.202.xxx:4000: connect: connection refused
  * [/ip4/23.195.xxx/tcp/4000] dial tcp4 23.195.xxx:4000: connect: connection refused
  * [/ip4/127.0.0.1/tcp/4000] failed to negotiate security protocol: message did not have trailing newline
  * [/ip4/23.217.xxx/tcp/4000] dial tcp4 23.217.xxx:4000: connect: connection refused
  * [/ip4/23.217.xxx/tcp/4000] dial tcp4 23.217.xxx:4000: connect: connection refused
  * [/ip4/10.52.xxx/tcp/4000] dial tcp4 10.52.2.193:4000: connect: no route to host
  * [/ip4/10.52.xxx/tcp/4000] dial tcp4 10.52.4.162:4000: connect: no route to host
  * [/ip4/92.242.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->92.242.xxx:4000: i/o timeout
  * [/ip4/10.55.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->10.55.xxx:4000: i/o timeout
  * [/ip4/10.52.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->10.52.xxx:4000: i/o timeout
  * [/ip4/10.52.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->10.52.xxx:4000: i/o timeout
  * [/ip4/18.211.xxx/tcp/4000] dial tcp4 0.0.0.0:12000->18.211.xxx:4000: i/o timeout
  * [/ip4/35.224.xxx/tcp/30001] failed to negotiate security protocol: read tcp4 192.168.xxx:12000->35.224.xxx/30001: read: connection reset by peer
[2019-05-22 15:30:58] ERROR p2p: Failed to reconnect to peer dial backoff

I'm see a few things wrong with this:

prestonvanloon commented 5 years ago

Blocked by

dimchansky commented 5 years ago

Not sure about this case, but this pattern helped me to solve re-connection issue in my app:

host.Peerstore().ClearAddrs(targetPeerID)

if sw, ok := host.Network().(*swarm.Swarm); ok {
    sw.Backoff().Clear(targetPeerID)
}

if err := host.Connect(ctx, *targetPeerAddr); err != nil {
    return nil, err
}
rauljordan commented 4 years ago

No longer relevant in latest master