prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.46k stars 985 forks source link

Some validators "EXITED" after roughtime incident. #7035

Closed DigiDr closed 4 years ago

DigiDr commented 4 years ago

šŸž Bug Report

Description

Looks like approx 7/57 of my validators were EXITED during the black-Friday medalla incident. I was running a conventional beacon-chain node on x86/Ubuntu/docker which is kept up to date with :latest and docker watchtower. A slasher node was running. The validator is on the same machine as the beacon-chain node. The EXITED nodes all now report a balance of 0.

[2020-08-17 16:55:25] INFO validator: Validator exited index=5399 pubKey=0x97ae4e42458e status=EXITED [2020-08-17 16:55:25] INFO validator: Validator exited index=5424 pubKey=0xa9975acf6960 status=EXITED [2020-08-17 16:55:25] INFO validator: Validator exited index=5415 pubKey=0xae95084a831a status=EXITED [2020-08-17 16:55:25] INFO validator: Validator exited index=5385 pubKey=0x8c4e28ef8321 status=EXITED [2020-08-17 16:55:25] INFO validator: Validator exited index=5379 pubKey=0xaf7a1e412309 status=EXITED [2020-08-17 16:55:25] INFO validator: Validator exited index=5382 pubKey=0x82e0c39fbcb0 status=EXITED [2020-08-17 16:55:25] INFO validator: Validator exited index=5422 pubKey=0xa4491d3657e3 status=EXITED [2020-08-17 16:55:25] INFO validator: Validator activated index=5410 publicKey=0xb8134a52100e [2020-08-17 16:55:25] INFO validator: Validator activated index=5421 publicKey=0x99989317430f [2020-08-17 16:55:25] INFO validator: Validator activated index=5430 publicKey=0x8442f4eaa7e2 [2020-08-17 16:55:25] INFO validator: Validator activated index=5388 publicKey=0xb005cf5e4a87 [2020-08-17 16:55:25] INFO validator: Validator activated index=5398 publicKey=0xa21ffbfc239a [2020-08-17 16:55:25] INFO validator: Validator activated index=5413 publicKey=0x8472a88ea912 [2020-08-17 16:55:25] INFO validator: Validator activated index=5420 publicKey=0x82bc0d808641 [2020-08-17 16:55:25] INFO validator: Validator activated index=5423 publicKey=0xaceec75bbcf7 [2020-08-17 16:55:25] INFO validator: Validator activated index=5384 publicKey=0xb98786a916e9 [2020-08-17 16:55:25] INFO validator: Validator activated index=5394 publicKey=0xaf61506f39fa [2020-08-17 16:55:25] INFO validator: Validator activated index=5397 publicKey=0xa4e489a4ec32 [2020-08-17 16:55:25] INFO validator: Validator activated index=5403 publicKey=0xb0e00da187a5 [2020-08-17 16:55:25] INFO validator: Validator activated index=5416 publicKey=0x8dd46c817fd9 [2020-08-17 16:55:25] INFO validator: Validator activated index=5433 publicKey=0xab0015047af1 [2020-08-17 16:55:25] INFO validator: Validator activated index=5418 publicKey=0x8feb51979b57 [2020-08-17 16:55:25] INFO validator: Validator activated index=5411 publicKey=0xa1d0c38736e2 [2020-08-17 16:55:25] INFO validator: Validator activated index=5390 publicKey=0xb183563f80c1 [2020-08-17 16:55:25] INFO validator: Validator activated index=5396 publicKey=0x8c486f8bc816 [2020-08-17 16:55:25] INFO validator: Validator activated index=5401 publicKey=0x81d0f2843866 [2020-08-17 16:55:25] INFO validator: Validator activated index=5404 publicKey=0xa1d832964ae3 [2020-08-17 16:55:25] INFO validator: Validator activated index=5402 publicKey=0xb84ff391b44c [2020-08-17 16:55:25] INFO validator: Validator activated index=5407 publicKey=0xb5aeea83205b [2020-08-17 16:55:25] INFO validator: Validator activated index=5414 publicKey=0xa225a84aec50 [2020-08-17 16:55:25] INFO validator: Validator activated index=5419 publicKey=0x8289559a5c74 [2020-08-17 16:55:25] INFO validator: Validator activated index=5393 publicKey=0xb06b081e9d46 [2020-08-17 16:55:25] INFO validator: Validator activated index=5409 publicKey=0xb739197176f9 [2020-08-17 16:55:25] INFO validator: Validator activated index=5431 publicKey=0x8b1a7d9b62f3 [2020-08-17 16:55:25] INFO validator: Validator activated index=5405 publicKey=0x8c49428fab0a [2020-08-17 16:55:25] INFO validator: Validator activated index=5426 publicKey=0x8b54dfee5a97 [2020-08-17 16:55:25] INFO validator: Validator activated index=5387 publicKey=0x9211f08fad46 [2020-08-17 16:55:25] INFO validator: Validator activated index=5380 publicKey=0x8ff1135098f3 [2020-08-17 16:55:25] INFO validator: Validator activated index=5427 publicKey=0xae2bdaccbe0b [2020-08-17 16:55:25] INFO validator: Validator activated index=5428 publicKey=0xad4afe5ae830 [2020-08-17 16:55:25] INFO validator: Validator activated index=5429 publicKey=0x8ceb42312cd3 [2020-08-17 16:55:25] INFO validator: Validator activated index=5432 publicKey=0x8123b8e02786 [2020-08-17 16:55:25] INFO validator: Validator activated index=5406 publicKey=0x84321cf6471b [2020-08-17 16:55:25] INFO validator: Validator activated index=5425 publicKey=0x87bcaa550dfa [2020-08-17 16:55:25] INFO validator: Validator activated index=5386 publicKey=0xb16471f012e7 [2020-08-17 16:55:25] INFO validator: Validator activated index=5389 publicKey=0x97f2cc687739 [2020-08-17 16:55:25] INFO validator: Validator activated index=5395 publicKey=0xb23aa1134bea [2020-08-17 16:55:25] INFO validator: Validator activated index=5391 publicKey=0xac7841f80052 [2020-08-17 16:55:25] INFO validator: Validator activated index=5392 publicKey=0xb2c1eeafbe8a [2020-08-17 16:55:25] INFO validator: Validator activated index=5408 publicKey=0x82693c0705b8 [2020-08-17 16:55:25] INFO validator: Validator activated index=5412 publicKey=0x8ca55674c7c6 [2020-08-17 16:55:25] INFO validator: Validator activated index=5378 publicKey=0x8f9929560720 [2020-08-17 16:55:25] INFO validator: Validator activated index=5400 publicKey=0xa41252436e41 [2020-08-17 16:55:25] INFO validator: Validator activated index=5381 publicKey=0x93fbc96cfe86 [2020-08-17 16:55:25] INFO validator: Validator activated index=5383 publicKey=0x9372410d24e1 [2020-08-17 16:55:25] INFO validator: Validator activated index=5417 publicKey=0x928de7409e36 [2020-08-17 16:55:25] INFO validator: Validator activated index=5434 publicKey=0x90337401a698

Setup: docker-compose.yaml

version: '3.5'

services: eth1-goerli-node: stdin_open: true tty: true container_name: eth1-goerli-node image: ethereum/client-go volumes:

networks: ethereum: name: ethereum driver: bridge

Can you all give more details on how you were running the node ?
Ex: Script, Docker, Bazel , etc - see above.

Also what is the exact command you guys were using to run the node. - see above

During the course of the roughtime incident did you restart your node ? - no but it would have restarted with new releases due to watchtower.

If any of you were running on docker, were you specifying the volume
for the validator db ? - see above. 
alexfisher commented 4 years ago

Adding in a comment here per @nisdas 's request. I'm also showing that 1 of 20 validators were exited.

time="2020-08-16 23:07:16" level=info msg="Validator exited" index=27479 prefix=validator pubKey=0xb95e040cda30 status=EXITED

Logs are being sent out to Papertrail on a rolling 7 day basis so I can provide any additional info needed over next few days or so.

I've got beacon, validator, slasher and geth-goerli running using docker-compose and tied to prysm docker tags. Have switched a few times over last couple days: HEAD-78ec78, HEAD-0d7ea3, HEAD-d24f99, to v1.0.0-alpha.20, 21, and finally 22.

droconnel22 commented 4 years ago

This affected my validator as well: https://beaconscan.com/medalla/validator/0xad987e4f090ff0c121d6fc24628913cc1c3f2b13385c48267613966b0cfcaa501198c92896755ea91edfef87f75fe012

nisdas commented 4 years ago

Hey guys thanks for reporting this, @DigiDr @alexfisher @droconnel22 .

A few questions: 1) Can you all give more details on how you were running the node ? Ex: Script, Docker, Bazel , etc

2) Also what is the exact command you guys were using to run the node.

3) During the course of the roughtime incident did you restart your node ?

4) If any of you were running on docker, were you specifying the volume for the validator db ?

alexfisher commented 4 years ago

Hey guys thanks for reporting this, @DigiDr @alexfisher @droconnel22 .

A few questions:

  1. Can you all give more details on how you were running the node ? Ex: Script, Docker, Bazel , etc

I'm running the node with Docker using a modified fork of the prysm-docker-compose project.

  1. Also what is the exact command you guys were using to run the node.

Using docker-compose, and command is generated from a couple different configuration files.

.env

#PRYSM_DOCKER_TAG=HEAD-78ec78
#PRYSM_DOCKER_TAG=HEAD-0d7ea3
#PRYSM_DOCKER_TAG=HEAD-d24f99
#PRYSM_DOCKER_TAG=v1.0.0-alpha.21
#PRYSM_DOCKER_TAG=v1.0.0-alpha.22
PRYSM_DOCKER_TAG=v1.0.0-alpha.23

docker-compose.yaml

       beacon:
                container_name: beacon-chain
                image: gcr.io/prysmaticlabs/prysm/beacon-chain:${PRYSM_DOCKER_TAG}
                restart: always
                hostname: beacon-chain
                depends_on:
                        - geth-goerli
                command: --config-file=/config/beacon.yaml
                ulimits:
                        nofile:
                                soft: 40000
                                hard: 40000
                ports:
                        - 127.0.0.1:4000:4000
                        - 13100:13100/tcp
                        - 12100:12100/udp
                volumes:
                        - ./config/beacon.yaml:/config/beacon.yaml:ro
                        - ./beacon:/data
                <<: *logging
        validator:
                container_name: validator
                image: gcr.io/prysmaticlabs/prysm/validator:${PRYSM_DOCKER_TAG}
                restart: on-failure
                hostname: validator
                depends_on:
                        - beacon
                command: --config-file=/config/validator.yaml
                volumes:
                        - ./config/validator.yaml:/config/validator.yaml:ro
                        - ./validator:/data
                <<: *logging

config/beacon.yaml

############################################################
##
## Read up on parameters on
## https://docs.prylabs.network/docs/prysm-usage/parameters/
##
############################################################

datadir: /data

#######################
# Connectivity settings
p2p-host-ip: ""
p2p-host-dns: ""

rpc-host: 0.0.0.0
monitoring-host: 0.0.0.0

# disable scan of local network
p2p-denylist: ["10.0.0.0/8","172.16.0.0/12","192.168.0.0/16","100.64.0.0/10","169.254.0.0/16"]

# changing this also needs to be changed in docker-compose.yaml!
p2p-tcp-port: 13100
p2p-udp-port: 12100

# enable db backup endpoint
enable-db-backup-webhook: true

# Tweaks due to Medalla issues
p2p-max-peers: 150
block-batch-limit: 512

##############################
# Connection to geth container
http-web3provider: http://geth-goerli:8645
web3provider: ws://geth-goerli:8646

config/validator.yaml (some info redacted)

############################################################
##
## Read up on parameters on
## https://docs.prylabs.network/docs/prysm-usage/parameters/
##
############################################################

##############
# Connectivity
beacon-rpc-provider: beacon:4000
monitoring-host: 0.0.0.0

#####################################
# Validator accounts & key management
wallet-dir: XXXXXXXXXXXXXXXXXXX
wallet-password-file: XXXXXXXXXXXXXXXXXXX

###########
# Fun Stuff
graffiti: "XXXXXXXXXXXXXXXXXXX"

Admittedly, the above files have been modified since the incident.

  1. During the course of the roughtime incident did you restart your node ?

Yes, immediately updated as soon as the first "fix" was released. And then for the various alpha releases over the last 4 or so days. Many times to try to keep sync'd.

  1. If any of you were running on docker, were you specifying the volume for the validator db ?

Specifying locations for validator data volume.

droconnel22 commented 4 years ago

Hey guys thanks for reporting this, @DigiDr @alexfisher @droconnel22 .

A few questions:

  1. Can you all give more details on how you were running the node ? Ex: Script, Docker, Bazel , etc

Docker on Ubuntu 18

  1. Also what is the exact command you guys were using to run the node.
docker run  --name beacon-node-medalla -d --restart always -it -v $HOME/.eth2:/data -p 4000:4000 -p 13000:13000 -p 12000:12000/udp  \
  gcr.io/prysmaticlabs/prysm/beacon-chain:latest \
  --datadir=/data \
  --rpc-host=0.0.0.0 \
  --monitoring-host=0.0.0.0 \
 --p2p-max-peers=300
  1. During the course of the roughtime incident did you restart your node ?

Yes, for each release from Alpha v.20 to v.21, the rollback to v.20, then v.21 to v.23

  1. If any of you were running on docker, were you specifying the volume for the validator db ?

I ran the validator as this:

docker run --name validator-node-medalla-4  --restart always -it -v $HOME/Eth2Validators/prysm-wallet-v2:/wallet --network="host" \
  -v $HOME/Eth2Validators/prysm-wallet-v2-passwords:/eth2passwords \
  gcr.io/prysmaticlabs/prysm/validator:latest \
  --beacon-rpc-provider=127.0.0.1:4000 \
  --wallet-dir=/wallet \
  --passwords-dir=/eth2passwords \
  --graffiti="poapB7OfT95KOLrOIStUbayHxY3+P9wA"
terencechain commented 4 years ago

Closing this as we have collected all the information and this hasn't been active for 2 weeks