prysmaticlabs / prysm

Go implementation of Ethereum proof of stake
https://www.offchainlabs.com
GNU General Public License v3.0
3.48k stars 1.03k forks source link

Devnet: transactions on 2nd execution client will never be validated. #13927

Closed DoHaiSon closed 2 months ago

DoHaiSon commented 7 months ago

Describe the bug

I set up a devnet following this guide: https://docs.prylabs.network/docs/advanced/proof-of-stake-devnet

Anything is oke, Exe cli and PoS cli are connected to bootnodes inside a local LAN.

In the first computer (the node is running validator cli), I can send transactions (tx). Then, they are validated as well (<12s, a time slot). However, If I send txs to the second computer, they will never be validated.

I tried to send a tx by an existing EOA in both computers, so I confirmed that: in the both exe cli, I saw the correct pending transactions on the list.

On the second computer, the log is as follows:

INFO [04-26|17:28:21.352] Looking for peers                        peercount=1 tried=25 static=0
INFO [04-26|17:28:29.746] Imported new potential chain segment     number=176 hash=5e7534..373549 blocks=1 txs=0 mgas=0.000 elapsed=3.443ms     mgasps=0.000  snapdiffs=883.00B triediffs=29.79KiB triedirty=0.00B
INFO [04-26|17:28:29.773] Chain head was updated                   number=176 hash=5e7534..373549 root=270707..05d5ba elapsed=3.71914ms
INFO [04-26|17:28:29.788] Starting work on payload                 id=0x0235a2e3a4e62bd8
INFO [04-26|17:28:29.789] Updated payload                          id=0x0235a2e3a4e62bd8 number=177 hash=dae4f7..f55e0a txs=1 withdrawals=0 gas=114,633 fees=0.0001146329992 root=0f52e4..1b46c0 elapsed="400.78µs"
INFO [04-26|17:28:29.802] Stopping work on payload                 id=0x02445c5fcba222bd reason=timeout

More information, If I send tx to 2nd computer using unknown EOA at 1st computer. The payload on 2nd is stuck right there. On 1st, the validation process still worked fine but it didn't sync tx from 2nd payload into itself.

Oh, I found something. After a lot of time (~30m), the pending txs on 2nd computer are mined. How can I check what step makes it so slow?

Has this worked before in a previous version?

Yes, this test worked on v3.2.0.

🔬 Minimal Reproduction

Go-ethereum: latest unstable build from git source (Version: 1.14.0-unstable) Prysm: 5.0.2

Prysmctl

./prysmctl testnet generate-genesis --fork capella --num-validators 1000 --genesis-time-delay 120 \ --chain-config-file config.yml --geth-genesis-json-in genesis.json --geth-genesis-json-out genesis.json --output-ssz genesis.ssz

1st Geth

./geth --http --http.api="engine,eth,net,web3,personal,txpool" \ --ws.api="engine,net,web3,personal,txpool" \ --bootnodes "" \ --authrpc.jwtsecret jwt.hex --datadir gethdata \ --syncmode full --allow-insecure-unlock --unlock 0x123463a4b065722e99115d6c222f267d9cabb524 --password="" \ --http.addr 192.168.0.1 --http.port 8545 \ --ws --ws.addr 192.168.0.1 --ws.port 8546 \ --nat=extip:192.168.0.1 \ --networkid 1 console

2nd Geth

./geth --http --http.api="engine,eth,net,web3,personal,txpool" \ --ws.api="engine,net,web3,personal,txpool" \ --bootnodes "" \ --authrpc.jwtsecret jwt.hex --datadir gethdata \ --syncmode full --allow-insecure-unlock --unlock 0x123463a4b065722e99115d6c222f267d9cabb524 --password="" \ --http.addr 192.168.0.2 --http.port 8545 \ --ws --ws.addr 192.168.0.2 --ws.port 8546 \ --nat=extip:192.168.0.2 \ --networkid 1 console

1st Beacon-chain

./beacon-chain --datadir beacondata --min-sync-peers 0 \ --genesis-state genesis.ssz --bootstrap-node= --interop-eth1data-votes --chain-config-file config.yml \ --contract-deployment-block 0 --chain-id 2024 --accept-terms-of-use --jwt-secret jwt.hex \ --suggested-fee-recipient 0x123463a4B065722E99115D6c222f267d9cABb524 --minimum-peers-per-subnet 0 \ --enable-debug-rpc-endpoints --execution-endpoint gethdata/geth.ipc \ --bootstrap-node=""

Validators

./validator --datadir validatordata --accept-terms-of-use --interop-num-validators 1000 --chain-config-file config.yml

2nd Beacon-chain

./beacon-chain --datadir beacondata --min-sync-peers 1 \ --genesis-state genesis.ssz --bootstrap-node= --interop-eth1data-votes --chain-config-file config.yml \ --contract-deployment-block 0 --chain-id 2024 --accept-terms-of-use --jwt-secret jwt.hex \ --suggested-fee-recipient 0x123463a4B065722E99115D6c222f267d9cABb524 --minimum-peers-per-subnet 0 \ --enable-debug-rpc-endpoints --execution-endpoint gethdata/geth.ipc \ --bootstrap-node=""

Error

No response

Platform(s)

Linux (x86)

What version of Prysm are you running? (Which release)

v5.0.2

Anything else relevant (validator index / public key)?

No response

prestonvanloon commented 2 months ago

It seems likely that the geth nodes are not peered together. Consider running a bootnode and connecting both geth nodes to it so they can discover each other.