prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com/
GNU General Public License v3.0
3.41k stars 961 forks source link

Failed to start beacon: Start request repeated too quickly #10468

Open mohamedmansour opened 2 years ago

mohamedmansour commented 2 years ago

🐞 Bug Report

Description

Power outage happened for two hours, and it automatically shut down my server safely when battery depleted in my UPS. It should have not crashed and continued checking for internet connection. Geth recovered really quickly at the end of 01:09AM with peers, meanwhile Prysm couldn't find internet connection at 1:10 and crashed.

🔬 Minimal Reproduction

10:12 PM - Power out, Internet Out, Backup Battery on 12:47 - Power on, Internet Out, Beacon attempting to retrying 12:49 - Power out again 01:09 - Power on, Internet still out 01:10 - Beacon exited, Failed to start eth2

🔥 Error

uptime


 08:14:10 up  7:05,  1 user,  load average: 0.44, 1.19, 1.41

reboot logs


shutdown system down  5.4.0-99-generic Wed Mar 30 22:12 - 00:46  (02:33)
reboot   system boot  5.4.0-105-generi Thu Mar 31 00:46 - 00:51  (00:04)
runlevel (to lvl 5)   5.4.0-105-generi Thu Mar 31 00:47 - 00:51  (00:03)
shutdown system down  5.4.0-105-generi Thu Mar 31 00:51 - 01:09  (00:17)
reboot   system boot  5.4.0-105-generi Thu Mar 31 01:09   still running
runlevel (to lvl 5)   5.4.0-105-generi Thu Mar 31 01:09   still running

### prysm beacon logs

Mar 30 22:12:37 server systemd[1]: Stopping eth2 beacon chain service...
Mar 30 22:12:37 server prysm.sh[891]: time="2022-03-30 22:12:37" level=info msg="Got interrupt, shutting down..." prefix=node
Mar 30 22:12:37 server prysm.sh[891]: time="2022-03-30 22:12:37" level=info msg="Stopping beacon node" prefix=node
Mar 30 22:12:38 server systemd[1]: beacon-chain.service: Succeeded.
Mar 30 22:12:38 server systemd[1]: Stopped eth2 beacon chain service.
-- Reboot --
Mar 31 00:47:29 server systemd[1]: Started eth2 beacon chain service.
Mar 31 00:47:47 server prysm.sh[1544]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 00:47:47 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 00:47:47 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 00:47:47 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 1.
Mar 31 00:47:47 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 00:47:47 server systemd[1]: Started eth2 beacon chain service.
Mar 31 00:48:07 server prysm.sh[1617]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 00:48:07 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 00:48:07 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 00:48:08 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 2.
Mar 31 00:48:08 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 00:48:08 server systemd[1]: Started eth2 beacon chain service.
Mar 31 00:48:28 server prysm.sh[1665]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 00:48:28 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 00:48:28 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 00:48:28 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 3.
Mar 31 00:48:28 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 00:48:28 server systemd[1]: Started eth2 beacon chain service.
Mar 31 00:48:48 server prysm.sh[1726]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 00:48:48 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 00:48:48 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 00:48:48 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 4.
Mar 31 00:48:48 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 00:48:48 server systemd[1]: Started eth2 beacon chain service.
Mar 31 00:49:08 server prysm.sh[1771]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 00:49:08 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 00:49:08 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 00:49:08 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 5.
Mar 31 00:49:08 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 00:49:08 server systemd[1]: Started eth2 beacon chain service.
Mar 31 00:49:28 server prysm.sh[1819]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 00:49:28 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 00:49:28 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 00:49:29 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 6.
Mar 31 00:49:29 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 00:49:29 server systemd[1]: Started eth2 beacon chain service.
Mar 31 00:49:47 server systemd[1]: Stopping eth2 beacon chain service...
Mar 31 00:49:47 server systemd[1]: beacon-chain.service: Succeeded.
Mar 31 00:49:47 server systemd[1]: Stopped eth2 beacon chain service.
-- Reboot --
Mar 31 01:09:41 server systemd[1]: Started eth2 beacon chain service.
Mar 31 01:09:59 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 01:09:59 server prysm.sh[1539]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 01:09:59 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 01:09:59 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 1.
Mar 31 01:09:59 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 01:09:59 server systemd[1]: Started eth2 beacon chain service.
Mar 31 01:10:19 server prysm.sh[1618]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 01:10:19 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 01:10:19 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 01:10:19 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 2.
Mar 31 01:10:19 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 01:10:19 server systemd[1]: Started eth2 beacon chain service.
Mar 31 01:10:31 server prysm.sh[1663]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 01:10:31 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 01:10:31 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 01:10:31 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 3.
Mar 31 01:10:31 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 01:10:31 server systemd[1]: Started eth2 beacon chain service.
Mar 31 01:10:32 server prysm.sh[1708]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 01:10:32 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 01:10:32 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 01:10:32 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 4.
Mar 31 01:10:32 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 01:10:32 server systemd[1]: Started eth2 beacon chain service.
Mar 31 01:10:32 server prysm.sh[1735]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 01:10:32 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 01:10:32 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 01:10:32 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 5.
Mar 31 01:10:32 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 01:10:32 server systemd[1]: Started eth2 beacon chain service.
Mar 31 01:10:32 server prysm.sh[1765]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 01:10:32 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 01:10:32 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 01:10:32 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 6.
Mar 31 01:10:32 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 01:10:32 server systemd[1]: Started eth2 beacon chain service.
Mar 31 01:10:33 server prysm.sh[1793]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 01:10:33 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 01:10:33 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 01:10:33 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 7.
Mar 31 01:10:33 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 01:10:33 server systemd[1]: Started eth2 beacon chain service.
Mar 31 01:10:33 server prysm.sh[1822]: Starting prysm requires an internet connection. If you are being blocked by your antivirus, you can download the beacon chain and validator executables from our releases page on Github here https://github.com/prysmaticlabs/prysm/relea>
Mar 31 01:10:33 server systemd[1]: beacon-chain.service: Main process exited, code=exited, status=1/FAILURE
Mar 31 01:10:33 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 01:10:33 server systemd[1]: beacon-chain.service: Scheduled restart job, restart counter is at 8.
Mar 31 01:10:33 server systemd[1]: Stopped eth2 beacon chain service.
Mar 31 01:10:33 server systemd[1]: beacon-chain.service: Start request repeated too quickly.
Mar 31 01:10:33 server systemd[1]: beacon-chain.service: Failed with result 'exit-code'.
Mar 31 01:10:33 server systemd[1]: Failed to start eth2 beacon chain service.
Comparing with geth, it didn't crash and it found peers in 1:09:54:

Mar 30 22:12:56 server geth[10801]: INFO [03-30|22:12:56.273] Blockchain stopped
Mar 30 22:12:57 server systemd[1]: eth1.service: Succeeded.
Mar 30 22:12:57 server systemd[1]: Stopped geth eth1 service.
-- Reboot --
Mar 31 00:47:29 server systemd[1]: Started geth eth1 service.
Mar 31 00:49:38 server geth[932]: INFO [03-31|00:49:38.725] Looking for peers                        peercount=0 tried=1 static=0
Mar 31 00:49:47 server geth[932]: INFO [03-31|00:49:47.858] Got interrupt, shutting down...
Mar 31 00:49:47 server geth[932]: INFO [03-31|00:49:47.860] HTTP server stopped                      endpoint=127.0.0.1:8545
Mar 31 00:49:47 server geth[932]: INFO [03-31|00:49:47.862] HTTP server stopped                      endpoint=127.0.0.1:8546
Mar 31 00:49:47 server geth[932]: INFO [03-31|00:49:47.862] IPC endpoint closed                      geth.ipc
Mar 31 00:49:47 server geth[932]: INFO [03-31|00:49:47.862] Ethereum protocol stopped
Mar 31 00:49:47 server geth[932]: INFO [03-31|00:49:47.863] Transaction pool stopped
Mar 31 00:49:47 server systemd[1]: Stopping geth eth1 service...
Mar 31 00:49:48 server geth[932]: INFO [03-31|00:49:48.079] Blockchain stopped
Mar 31 00:49:48 server systemd[1]: eth1.service: Succeeded.
Mar 31 00:49:48 server systemd[1]: Stopped geth eth1 service.
-- Reboot --
Mar 31 01:09:41 server systemd[1]: Started geth eth1 service.
 Mar 31 01:09:54 server geth[937]: INFO [03-31|01:09:54.752] Looking for peers                        peercount=11 tried=2 static=0
## 🌍 Your Environment **Operating System:**
  
Linux server 5.4.0-105-generic #119-Ubuntu SMP Mon Mar 7 18:49:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  
**What version of Prysm are you running? (Which release)**
  
2.0.6
  
prestonvanloon commented 2 years ago

This seems to be an issue with the prysm.sh script. The first step of the prysm.sh script is to check if there is a new update available. That step requires an internet connection.

Workaround

There are two things I think you can do as a workaround (either would work): 1) Manually specify the prysm version with an environment variable USE_PRYSM_VERSION=v2.0.6 2) Use the binary files directly rather than on the prysm.sh script. These files are downloaded to the dist/ directory.

Fix

I think the fix here is for the prysm.sh to choose the highest previously downloaded binary to start if no specific version was requested and the latest release is unable to be determined. There are more reasons than local internet issues that could cause prysm.sh to fail to retrieve the latest version tag so this could be a bigger issue if not resolved.