Closed stefa2k closed 3 years ago
Thanks for the report! gRPC retries were not configured correctly for gRPC streams, which is what the validator uses on initial contact for the beacon node.
I've added PR #6670 which includes the following fixes and improvements:
--grpc-retries
--grpc-retry-delay
, default of 1s.--verbosity=debug
or --verbosity=trace
So now the new behavior will be: "On initial contact of beacon node, open a stream to wait on the beacon node to report fully synced. If the connection is UNAVAILABLE or RESOURCE_EXHAUSTED, retry up to --grpc-retries
with --grpc-retry-delay
between each attempt. If all attempts have been exhausted, log a fatal condition and stop the process".
If you find that satisfactory for your use case, then please close this issue. If not, please advise a recommended solution to resolve your use case.
Thanks @prestonvanloon
Just a quick question, is -1
also supported?
I run docker tag HEAD-62ae22
now and it looks like setting --grpc-retries
to 18446744073709551615
in my configs doesn't have any effect, the logs still look the same:
time="2020-07-22 17:21:40" level=info msg="Waiting for beacon chain start log from the ETH 1.0 deposit contract" prefix=validator
validator | time="2020-07-22 17:21:44" level=fatal msg="Could not determine if beacon chain started: could not receive ChainStart from stream: rpc error: code = Unavailable desc = Service Unavailable: HTTP status code 503; transport: received the unexpected content-type "text/html"" prefix=validator
validator | time="2020-07-22T17:21:47Z" level=info msg="New calculated roughtime offset is -38419284 ns" prefix=roughtime
validator | time="2020-07-22 17:21:47" level=warning msg="Enabled validator slashing protection." prefix=flags
validator | time="2020-07-22 17:21:47" level=warning msg="Enabled validator attestation and block slashing protection using an external slasher." prefix=flags
Can you try a different value other than max uint? Can you please attach logs at debug level?
When I tried it with --grpc-retries=10000 and --verbosity=debug, i could see the following logs to indicate that it was retrying.
[2020-07-22 14:20:13] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:14] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:15] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:16] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:17] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:18] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:19] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:20] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:21] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:22] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:23] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:24] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:25] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
[2020-07-22 14:20:26] DEBUG gRPC stream started. backend=[] method=/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart
There may be an issue with reading from a config file, but that would be separate from this issue.
Tried it with the params you mentioned (10000 and debug) and got the following:
validator | time="2020-07-23 10:43:48" level=info msg="Checking DB" databasePath="/home/.eth2" prefix=node
validator | time="2020-07-23 10:43:48" level=info msg="Starting validator node" prefix=node version="Prysm/v1.0.0-alpha.16/62ae22e3f953a1800b0018cd6eb499021f0af917. Built at: 2020-07-22 16:24:11+00:00"
validator | time="2020-07-23 10:43:48" level=debug msg="Starting 3 services: [*prometheus.Service *slashingprotection.Service *client.ValidatorService]" prefix=registry
validator | time="2020-07-23 10:43:48" level=debug msg="Starting service type *prometheus.Service" prefix=registry
validator | time="2020-07-23 10:43:48" level=debug msg="Starting service type *slashingprotection.Service" prefix=registry
validator | time="2020-07-23 10:43:48" level=debug msg="Starting service type *client.ValidatorService" prefix=registry
validator | time="2020-07-23 10:43:48" level=warning msg="You are using an insecure slasher gRPC connection! Please provide a certificate and key to use a secure connection."
validator | time="2020-07-23 10:43:48" level=debug msg="Successfully started slasher gRPC connection"
validator | time="2020-07-23 10:43:48" level=debug msg="Starting prometheus service" address="0.0.0.0:8081" prefix=prometheus
validator | time="2020-07-23 10:43:48" level=warning msg="You are using an insecure gRPC connection. If you are running your beacon node and validator on the same machines, you can ignore this message. If you want to know how to enable secure connections, see: https://docs.prylabs.network/docs/prysm-usage/secure-grpc" prefix=validator
validator | time="2020-07-23 10:43:48" level=debug msg="gRPC stream started." backend=[] method="/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart"
validator | time="2020-07-23 10:43:48" level=info msg="Waiting for beacon chain start log from the ETH 1.0 deposit contract" prefix=validator
validator | time="2020-07-23 10:43:49" level=debug msg="gRPC stream started." backend=[] method="/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart"
validator | time="2020-07-23 10:43:50" level=debug msg="gRPC stream started." backend=[] method="/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart"
validator | time="2020-07-23 10:43:51" level=debug msg="gRPC stream started." backend=[] method="/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart"
validator | time="2020-07-23 10:43:52" level=debug msg="gRPC stream started." backend=[] method="/ethereum.eth.v1alpha1.BeaconNodeValidator/WaitForChainStart"
validator | time="2020-07-23 10:43:52" level=fatal msg="Could not determine if beacon chain started: could not receive ChainStart from stream: rpc error: code = Unavailable desc = Service Unavailable: HTTP status code 503; transport: received the unexpected content-type "text/html"" prefix=validator
It looks like its not read off of the config file. But every other setting (e. g. verbosity: debug
) gets read correctly.
@prestonvanloon please triage
@stefa2k this actually seems related to #7339
This is since resolved in v1.2.0
π Bug Report
Description
I'm using haproxy to have failovers for all services including beacon-chain. All my beacon-chain nodes were unhealthy and therefore not able to serve grpc requsts, so haproxy returns http code 503 to validator's requests (expected behaviour). This causes the validator service to die (unexpected behaviour). The validator's restart can cause heavy cpu loads due to decrypting keys.
HTTP code 503 indicates the service is currently not able to serve any requests.
Solution with
--grpc-retries
-1
but it didn't help.4294967295
didn't help either.Validator service dies within 1 second of querying beacon service.
Has this worked before in a previous version?
π¬ Minimal Reproduction
Connect validator to an endpoint which returns http code 503 on requests.
π₯ Error
Validator log:
curl to haproxy:
π Your Environment
Operating System:
What version of Prysm are you running? (Which release)
haproxy config