prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.46k stars 985 forks source link

Beacon node restart due to "Could not get rough time result: lookup caesium.tannerryan.ca: too many open files" #7262

Closed xuyenvuong closed 4 years ago

xuyenvuong commented 4 years ago

🐞 Bug Report

Description

A clear and concise description of the problem... Occasionally I got several Grafana notification ([OK] WARN NODE/VALIDATOR: The process just restarted) about my Beacon Node is restarting. Checking the log and got these errors that is time corresponding to the node restarting time. During beacon node's down time, the number of attestation and aggregation failures are increasing as more than double, from: 1. 33 (8hrs prior) to 87 attestation failures 2. 3 (8hrs prior) to 7 aggregation failures. I would want to see if we can implement some failover solution for validator to avoid attestations and aggregation failures due to dependency to just one single beacon node. ### Has this worked before in a previous version? No, this issue is happening very often on previous version as well. I will start to collect more error logs for each time it is auto-restarting ## πŸ”¬ Minimal Reproduction

No particular reproducible steps.

πŸ”₯ Error




time="2020-09-17 18:28:34" level=error msg="Could not get rough time result: lookup caesium.tannerryan.ca: too many open files" prefix=roughtime
time="2020-09-17 18:28:34" level=error msg="Could not get rough time result: lookup roughtime.chainpoint.org: too many open files" prefix=roughtime
time="2020-09-17 18:28:34" level=error msg="Could not get rough time result: lookup roughtime.cloudflare.com: too many open files" prefix=roughtime
time="2020-09-17 18:28:34" level=error msg="Could not get rough time result: lookup roughtime.sandbox.google.com: too many open files" prefix=roughtime
time="2020-09-17 18:28:34" level=error msg="Could not get rough time result: lookup roughtime.int08h.com: too many open files" prefix=roughtime
time="2020-09-17 18:28:34" level=error msg="Could not get rough time result: lookup ticktock.mixmin.net: too many open files" prefix=roughtime
time="2020-09-17 18:28:34" level=error msg="Failed to calculate roughtime offset" error="no valid responses" prefix=roughtime

🌍 Your Environment

Operating System: Ubuntu latest on Pi 4 8 GB

  

  

What version of Prysm are you running? (Which release) alpha.25

  

  

Anything else relevant (validator index / public key)? https://medalla.beaconcha.in/dashboard?validators=12425,12433,12437,12442,12446,12456,12457,12461,12465,12469,12473,12474,12477,12480,12487,12490,12493,12499,12504,12509,12511,12516,12521,12525,12527,12532,12542,12544,12552,12567,12568,12569,12574

terencechain commented 4 years ago

Fixed in #7221

Will be in alpha.26

xuyenvuong commented 4 years ago

Thanks @terencechain