prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.46k stars 995 forks source link

Deposit acceptance by beacon chain not always noticed by Prysm's validator. #9216

Open ianmacd opened 3 years ago

ianmacd commented 3 years ago

🐞 Bug Report

Description

After a deposit is accepted by the beacon chain, the Prysm validator sometimes fails to acknowledge this event. In such cases, the new validator retains the status STATUS_UNKNOWN.

Has this worked before in a previous version?

Unknown.

🔬 Minimal Reproduction

I started with a working validator with no issues.

I stopped the validator (in the gap between two attestations) and imported a new keystore into the same wallet as the first one:

I then restarted the validator process:

$ sudo systemctl stop prysmvalidator
$ sudo /usr/local/bin/validator accounts import --keys-dir=/home/ianmacd/eth2deposit-cli/validator_keys
$ sudo systemctl start prysmvalidator

The following was logged:

Jul 16 07:15:31 serenity validator[53443]: time="2021-07-16 07:15:31" level=info msg="Validating for public key" prefix=validator publicKey=0x9961a780f5d5
Jul 16 07:15:31 serenity validator[53443]: time="2021-07-16 07:15:31" level=info msg="Waiting for deposit to be observed by beacon node" prefix=validator pubKey=0x9961a780f5d5 status="UNKNOWN_STATUS"

I then made my contract deposit via the eth2 launchpad. It went into block 12836095. And then I waited.

Just under 17 hours later, the deposit was accepted by the beacon chain in epoch 51,169, slot 1,637,422.

I continued to wait, but the Prysm validator logged nothing to suggest it had noticed the clearance of the deposit.

I waited for more than 7 hours and finally, after becoming somewhat nervous, decided to restart the Prysm validator, which then immediately logged the following:

Jul 17 11:35:43 serenity validator[59632]: time="2021-07-17 11:35:43" level=info msg="Validating for public key" prefix=validator publicKey=0x9961a780f5d5
Jul 17 11:35:43 serenity validator[59632]: time="2021-07-17 11:35:43" level=info msg="Deposit processed, entering activation queue after finalization" index=196685 positionInActivationQueue=1622 prefix=validator pubKey=0x9961a780f5d5 status=DEPOSITED

To my great relief, the restart forced the validator deposit's change in status to be picked up by Prysm, All is now well.

So, either the deposit's change in status is not always noticed by Prysm, or Prysm merely fails to log that it noticed said change. Either way, this would appear to be a bug.

I reported my finding to Nishant on the Prysmatic Labs Discord, and he requested that I open an issue on GitHub.

🌍 Your Environment

Debian 10 (Buster)

  
Linux serenity 5.4.0-77-generic #86-Ubuntu SMP Thu Jun 17 02:35:03 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  

What version of Prysm are you running? (Which release)

  
Prysm/v1.3.11/9fd5575f65ae6f238462ead00ce4f0b16cd0370c
  

Anything else relevant (validator index / public key)?

$ sudo /usr/local/bin/validator accounts list --wallet-dir=/var/lib/prysm/validator                                                                               [sudo] password for ianmacd:                                 
[2021-07-17 18:44:43]  WARN flags: Running on ETH2 Mainnet
Wallet password: 
(keymanager kind) imported wallet

Showing 2 validator accounts
View the eth1 deposit transaction data for your accounts by running `validator accounts list --show-deposit-data`

Account 0 | ultimately-eternal-oryx
[validating public key] 0x83285ca2bfc64d53bc11d0998e1fe663ed094b9e05eb76576c6a3157ff777ee1fa5ce43367cdcc9107690675242df33e

Account 1 | vertically-darling-gobbler
[validating public key] 0x9961a780f5d58e21cac2848caec45286292e5f8965a895c9c09a79a31dc2faf4875ad80cd600c2a6d6ddf893edeb8985

Account 0 is the original validator, whose own change in status from UNKNOWN_STATUS to DEPOSITED occurred flawlessly a couple of weeks ago.

Account 1 is the new validator whose deposit Prysm today failed to notice had been accepted by the beacon chain.

Propulsions commented 3 years ago

Confirmed same issue except when I imported the new account originally the validator service was stopped.

Upon restarting after being deposited by the beacon chain, the validator status changed to status=DEPOSITED as expected.

ianmacd commented 3 years ago

Confirmed same issue except when I imported the new account originally the validator service was stopped.

Note that the validator process was also stopped in my case at the time of the import.

nisdas commented 3 years ago

So i did a bit of digging on this, whenever the validator client starts up it retrieves the current status of all attached keys. It will then log out the status of all keys at the start. However, once it starts attesting and proposing it will only log out the status of 'active' keys. So even if your validator had its deposit accepted and is pending , it will not log out the status of that validator until it becomes 'active' . To summarise this is a cosmetic issue, the validator client is aware of all keys at all times. Some solutions to this:

1) Do nothing and leave the current validator client as it is. 2) Log out the current status of all keys during the start of an epoch. ( Even non deposited ones)

From what I remember we did have the latter flow but it might have been removed. cc @rauljordan

ianmacd commented 3 years ago

So i did a bit of digging on this, whenever the validator client starts up it retrieves the current status of all attached keys. It will then log out the status of all keys at the start. However, once it starts attesting and proposing it will only log out the status of 'active' keys. So even if your validator had its deposit accepted and is pending , it will not log out the status of that validator until it becomes 'active' .

Ah, so when I suggested to you on Discord that this was perhaps only a failure to log, rather than a failure to update state, my suspicion was actually correct.

Good to hear that it's only cosmetic and not something that could cause missed attestations once the validator is activated.

Logging the validator status at the start of an epoch sounds useful, but may annoy people with a large set of validators on a single host.

nisdas commented 3 years ago

Ah, so when I suggested to you on Discord that this was perhaps only a failure to log, rather than a failure to update state, my suspicion was actually correct.

Correct :-)

Logging the validator status at the start of an epoch sounds useful, but may annoy people with a large set of validators on a single host.

That is a possibility, it also might have been a reason why it was removed previously to reduce noisy logging.

ianmacd commented 3 years ago

Logging the validator status at the start of an epoch sounds useful, but may annoy people with a large set of validators on a single host.

That is a possibility, it also might have been a reason why it was removed previously to reduce noisy logging.

Then I suggest --log-key-status as a toggle for the per epoch logging you describe. The extra verbosity would certainly be tolerable, desirable even, for people like me with just a few keys.

nisdas commented 3 years ago

Then I suggest --log-key-status as a toggle for the per epoch logging you describe. The extra verbosity would certainly be tolerable, desirable even, for people like me with just a few keys.

Would wait for raul's view on this, but if we do add it in, I am inclined to not adding a flag for this. We already do have a lot of flags for minor stuff, and the increase in verbosity here would not be worth adding in a flag for imo.

Propulsions commented 3 years ago

Perhaps some additional logic such that if an account is not active, monitor the status between the current and last epoch and if it changes for example from UNKNOWN to DEPOSIT, reflect that. This results in no additional flags and would result in the status changes being reflected only during a status change in real-time.

Pseudo Code:

while v.status != "active":
    if v.status != last_epoch_status:
        print(f"Status changed to {new_status}")
nisdas commented 3 years ago

Perhaps some additional logic such that if an account is not active, monitor the status between the current and last epoch and if it changes for example from UNKNOWN to DEPOSIT, reflect that. This results in no additional flags and would result in the status changes being reflected only during a status change in real-time.

That could be another way, but we would now need to persist validator status states across epochs in that case. It might be better to simply log it all out ( for inactive keys).

mrabino1 commented 3 years ago

Personally speaking, i have imported waaay more keys than i am actually using. This was done such that the 24 words are used basically once and keys are deployed with the expectation of future deposits. Additional logging would be appreciated. That said, this only cosmetic right? So unless you are actively tailing / watching the log, the deposit would still activate. Big picture, this is a minor issue.