scroll-tech / go-ethereum

Scroll's fork of the official Go implementation of the Ethereum protocol
GNU Lesser General Public License v3.0
494 stars 276 forks source link

Node stuck at "looking for peers" until a service restart #868

Open ncavedale-xlabs opened 4 months ago

ncavedale-xlabs commented 4 months ago

System information

One of our testnet nodes has seemingly lost connection to all peers and cannot reconnect. As a result, the node was not getting new blocks. It remained in this state for several hours, until we restarted the service and then it was able to catch up normally. This is the second time it happens to us, just don't remember if the first one was also on testnet or mainnet.

Geth
Version: 5.5.0-mainnet
Architecture: amd64
Go Version: go1.22.4
Operating System: linux

Expected behaviour

Node should be able to find/reconnect to peers and keep syncing

Actual behaviour

Node loses connection to peers and doesn't find/reconnect to any peer until l2geth is restarted

Steps to reproduce the behaviour

N/A

Backtrace

Node was importing new chain segments until it wasn't:

Jun 27 12:35:59 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:35:59.513] Imported new chain segment               blocks=1  txs=1  mgas=0.021 elapsed=2.303ms   mgasps=9.117   >
Jun 27 12:36:01 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:01.668] Imported new chain segment               blocks=1  txs=1  mgas=0.021 elapsed=3.030ms   mgasps=6.930   >
Jun 27 12:36:04 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:04.990] Imported new chain segment               blocks=1  txs=1  mgas=0.021 elapsed=2.535ms   mgasps=8.282   >
Jun 27 12:36:07 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:07.064] Imported new chain segment               blocks=1  txs=1  mgas=0.021 elapsed=2.270ms   mgasps=9.251   >
Jun 27 12:36:07 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:07.811] Deep froze chain segment                 blocks=21 elapsed=13.866ms  number=4,904,702 hash=f83edb..bc5>
Jun 27 12:36:08 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:08.596] Looking for peers                        peercount=0 tried=340 static=0
Jun 27 12:36:18 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:18.901] Looking for peers                        peercount=0 tried=257 static=0
Jun 27 12:36:28 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:28.927] Looking for peers                        peercount=0 tried=411 static=0
Jun 27 12:36:38 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:38.936] Looking for peers                        peercount=0 tried=359 static=0
Jun 27 12:36:48 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:48.981] Looking for peers                        peercount=0 tried=293 static=0
Jun 27 12:36:59 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:36:59.114] Looking for peers                        peercount=0 tried=359 static=0
Jun 27 12:37:09 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:37:09.127] Looking for peers                        peercount=0 tried=353 static=0
Jun 27 12:37:19 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:37:19.403] Looking for peers                        peercount=0 tried=274 static=0
Jun 27 12:37:29 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:37:29.422] Looking for peers                        peercount=0 tried=352 static=0
Jun 27 12:37:39 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:37:39.440] Looking for peers                        peercount=0 tried=306 static=0
Jun 27 12:37:49 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:37:49.441] Looking for peers                        peercount=1 tried=335 static=0
Jun 27 12:37:59 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:37:59.481] Looking for peers                        peercount=0 tried=263 static=0
Jun 27 12:38:09 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:38:09.534] Looking for peers                        peercount=0 tried=318 static=0
Jun 27 12:38:19 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:38:19.556] Looking for peers                        peercount=0 tried=309 static=0
Jun 27 12:38:29 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:38:29.680] Looking for peers                        peercount=0 tried=341 static=0
Jun 27 12:38:39 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:38:39.681] Looking for peers                        peercount=0 tried=262 static=0
Jun 27 12:38:49 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:38:49.681] Looking for peers                        peercount=0 tried=321 static=0
Jun 27 12:38:59 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:38:59.700] Looking for peers                        peercount=0 tried=344 static=0
Jun 27 12:39:09 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:39:09.701] Looking for peers                        peercount=0 tried=424 static=0
Jun 27 12:39:10 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:39:10.401] Sync rollup events progress update       latestProcessedBlock=6,197,837
Jun 27 12:39:19 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:39:19.705] Looking for peers                        peercount=0 tried=216 static=0
Jun 27 12:39:29 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:39:29.938] Looking for peers                        peercount=0 tried=322 static=0
Jun 27 12:39:39 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:39:39.938] Looking for peers                        peercount=0 tried=226 static=0
Jun 27 12:39:49 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:39:49.975] Looking for peers                        peercount=0 tried=238 static=0
Jun 27 12:39:59 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:39:59.976] Looking for peers                        peercount=0 tried=262 static=0
Jun 27 12:40:09 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:40:09.989] Looking for peers                        peercount=0 tried=313 static=0
Jun 27 12:40:20 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:40:20.147] Looking for peers                        peercount=0 tried=360 static=0
Jun 27 12:40:30 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:40:30.151] Looking for peers                        peercount=0 tried=315 static=0
Jun 27 12:40:40 t-scroll-sepolia-01 l2geth[3256645]: INFO [06-27|12:40:40.168] Looking for peers                        peercount=0 tried=192 static=0
0xmountaintop commented 3 months ago

Hi @ncavedale-xlabs can you try set debug.vmodule("eth/*=6,p2p=6") via geth console and provide me the logs when this happens again?

ScepticMatt commented 2 months ago

I got the same error after upgrading to version 5.6 - seems like the chain id is wrong for some reason.

trace output:

Aug 16 07:28:58 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:58.419] Dial error                               id=69340d7cd19d5c58 addr=52.74.58.167:32213    conn=dyndial err="i/o timeout"
Aug 16 07:28:58 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:58.419] Dial error                               id=66591c6913b192e7 addr=13.43.97.33:32370     conn=dyndial err="i/o timeout"
Aug 16 07:28:58 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:58.546] Failed RLPx handshake                    addr=3.234.173.24:30303    conn=dyndial err=EOF
Aug 16 07:28:58 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:58.549] Accepted connection                      addr=89.116.24.44:47946
Aug 16 07:28:58 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:58.580] Rejected peer                            id=555ee9df9c6e75de addr=89.116.24.44:47946    conn=inbound err="useless peer"
Aug 16 07:28:58 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:58.732] Failed RLPx handshake                    addr=138.199.25.14:30303   conn=dyndial err=EOF
Aug 16 07:28:58 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:58.750] Failed RLPx handshake                    addr=138.199.25.14:30303   conn=dyndial err=EOF
Aug 16 07:28:58 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:58.803] Failed RLPx handshake                    addr=107.21.220.181:30303  conn=dyndial err=EOF
Aug 16 07:28:58 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:58.991] Accepted connection                      addr=18.139.114.143:46858
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:59.251] Rejected peer                            id=548938a66ad90bbf addr=18.139.114.143:46858  conn=inbound err="useless peer"
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:59.282] Accepted connection                      addr=18.171.67.70:15819
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: DEBUG[08-16|07:28:59.319] Adding p2p peer                          peercount=2 id=5e3d40fba57c443f conn=inbound addr=18.171.67.70:15819    name=parchain/v1.11.3-uns...
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:59.319] Starting protocol eth/68                 id=5e3d40fba57c443f conn=inbound
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: DEBUG[08-16|07:28:59.319] Ethereum handshake failed                id=5e3d40fba57c443f conn=inbound err="network ID mismatch: 600002 (!= 534352)"
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:59.319] Protocol eth/68 failed                   id=5e3d40fba57c443f conn=inbound err="network ID mismatch: 600002 (!= 534352)"
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: DEBUG[08-16|07:28:59.319] Removing p2p peer                        peercount=1 id=5e3d40fba57c443f duration="183.352µs" req=false err="network ID mismatch: 600002 (!= 534352)"
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:59.401] Accepted connection                      addr=64.176.56.219:41124
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: DEBUG[08-16|07:28:59.704] Adding p2p peer                          peercount=2 id=72f64269f4e2bf47 conn=inbound addr=64.176.56.219:41124   name=Geth/v1.3.4-stable-a...
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:59.704] Starting protocol eth/68                 id=72f64269f4e2bf47 conn=inbound
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: DEBUG[08-16|07:28:59.704] Ethereum handshake failed                id=72f64269f4e2bf47 conn=inbound err="network ID mismatch: 4058 (!= 534352)"
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: TRACE[08-16|07:28:59.704] Protocol eth/68 failed                   id=72f64269f4e2bf47 conn=inbound err="network ID mismatch: 4058 (!= 534352)"
Aug 16 07:28:59 axelar-archive nccc_geth[3393286]: DEBUG[08-16|07:28:59.704] Removing p2p peer                        peercount=1 id=72f64269f4e2bf47 duration="271.677µs" req=false err="network ID mismatch: 4058 (!= 534352)"
naviat commented 2 months ago

Same error with 5.6