rzmahmood / ethereum-pos-testnet

💻⛓️ A Quick and Easy Way to Bootstrap your own Local Ethereum PoS Testnet. Great for testing consensus ⛓️💻
MIT License
46 stars 25 forks source link

Persist network state #23

Open sergey-msu opened 5 months ago

sergey-msu commented 5 months ago

Hi @rzmahmood thank you for sharing your knowledge. This was extremely helpful!

But what if I want to deploy a private network with persisted state between the sessions (i.e. without clearing ./network folder)?

rzmahmood commented 5 months ago

To do that you need to separate the script into two parts, the generation of the genesis and the starting of the nodes. You do the generating of the genesis once as that should only be done once at the start of the network.

Then you can start the nodes and they will continue from their last persisted state, building on the genesis. Here's an example of how someone else separated the generation of the genesis https://github.com/QiLOL/ethereum-pos-testnet/blob/main/gen-genesis.sh

sergey-msu commented 5 months ago

@rzmahmood yep, it works You saved me a days of github/medium/.. articles doom searching. Thank you for your help!

sergey-msu commented 5 months ago

@rzmahmood the one final thing - after I've split the script into two: init and run, the first time ( just after the first init from genesis) everything works great. But when I  shut down my nodes and raise them up again (without re-generate genesis and clean db) - consensus clients fail to find execution nodes:

execution clients logs:

WARN [05-15|22:24:12.904] Post-merge network, but no beacon client seen. Please launch one to follow the chain! 

beacon clients logs:

time="2024-05-15 22:26:44" level=error msg="Beacon node is not respecting the follow distance. EL client is syncing." lastBlockNumber=282 prefix=execution
time="2024-05-15 22:26:48" level=info msg="Waiting for enough suitable peers before syncing" prefix=initial-sync required=1 suitable=0

May be you know what could be the reason?

my run script is

#!/bin/bash

set -exu
set -o pipefail

# Check if jq is installed
if ! command -v jq &> /dev/null; then
    echo "Error: jq is not installed. Please install jq first."
    exit 1
fi
# Check if curl is installed
if ! command -v curl &> /dev/null; then
    echo "Error: curl is not installed. Please install curl first."
    exit 1
fi

# NETWORK_DIR is where all files for the testnet will be stored,
# including logs and storage
NETWORK_DIR=./network

# Change this number for your desired number of nodes
NUM_NODES=2

# Port information. All ports will be incremented upon
# with more validators to prevent port conflicts on a single machine
GETH_BOOTNODE_PORT=30301

GETH_HTTP_PORT=8000
GETH_WS_PORT=8100
GETH_AUTH_RPC_PORT=8200
GETH_METRICS_PORT=8300
GETH_NETWORK_PORT=8400

PRYSM_BEACON_RPC_PORT=4000
PRYSM_BEACON_GRPC_GATEWAY_PORT=4100
PRYSM_BEACON_P2P_TCP_PORT=4200
PRYSM_BEACON_P2P_UDP_PORT=4300
PRYSM_BEACON_MONITORING_PORT=4400

PRYSM_VALIDATOR_RPC_PORT=7000
PRYSM_VALIDATOR_GRPC_GATEWAY_PORT=7100
PRYSM_VALIDATOR_MONITORING_PORT=7200

trap 'echo "Error on line $LINENO"; exit 1' ERR
# Function to handle the cleanup
cleanup() {
    echo "Caught Ctrl+C. Killing active background processes and exiting."
    kill $(jobs -p)  # Kills all background processes started in this script
    exit
}
# Trap the SIGINT signal and call the cleanup function when it's caught
trap 'cleanup' SIGINT

# Kill any hanging runtimes
pkill geth || echo "No existing geth processes"
pkill beacon-chain || echo "No existing beacon-chain processes"
pkill validator || echo "No existing validator processes"
pkill bootnode || echo "No existing bootnode processes"

# Set Paths for your binaries. Configure as you wish, particularly
# if you're developing on a local fork of geth/prysm
GETH_BINARY=./bin/geth
PRYSM_BEACON_BINARY=./bin/beacon-chain
PRYSM_VALIDATOR_BINARY=./bin/validator

sleep 2
# Get the ENODE from the first line of the logs for the bootnode
bootnode_enode=$(head -n 1 $NETWORK_DIR/bootnode/bootnode.log)
# Check if the line begins with "enode"
if [[ "$bootnode_enode" == enode* ]]; then
    echo "bootnode enode is: $bootnode_enode"
else
    echo "The bootnode enode was not found. Exiting."
    exit 1
fi

# The prysm bootstrap node is set after the first loop, as the first
# node is the bootstrap node. This is used for consensus client discovery
PRYSM_BOOTSTRAP_NODE=

# Calculate how many nodes to wait for to be in sync with. Not a hard rule
MIN_SYNC_PEERS=$((NUM_NODES/2))
echo $MIN_SYNC_PEERS is minimum number of synced peers required

# Start all nodes in a loop
for (( i=0; i<$NUM_NODES; i++ )); do
    NODE_DIR=$NETWORK_DIR/node-$i

    # We use an empty password. Do not do this in production
    geth_pw_file="$NODE_DIR/geth_password.txt"
    echo "" > "$geth_pw_file"

    # Start geth execution client for this node
    $GETH_BINARY \
      --networkid=${CHAIN_ID:-32382} \
      --http \
      --http.api=eth,net,web3 \
      --http.addr=127.0.0.1 \
      --http.corsdomain="*" \
      --http.port=$((GETH_HTTP_PORT + i)) \
      --port=$((GETH_NETWORK_PORT + i)) \
      --metrics.port=$((GETH_METRICS_PORT + i)) \
      --ws \
      --ws.api=eth,net,web3 \
      --ws.addr=127.0.0.1 \
      --ws.origins="*" \
      --ws.port=$((GETH_WS_PORT + i)) \
      --authrpc.vhosts="*" \
      --authrpc.addr=127.0.0.1 \
      --authrpc.jwtsecret=$NODE_DIR/execution/jwtsecret \
      --authrpc.port=$((GETH_AUTH_RPC_PORT + i)) \
      --datadir=$NODE_DIR/execution \
      --password=$geth_pw_file \
      --bootnodes=$bootnode_enode \
      --identity=node-$i \
      --maxpendpeers=$NUM_NODES \
      --verbosity=3 \
      --syncmode=full \
      --allow-insecure-unlock \
      --unlock 0xD96610a917c650AF7EE229F8AFe892dcF2f06c57 > "$NODE_DIR/logs/geth.log" 2>&1 &

    sleep 5

    # Start prysm consensus client for this node
    $PRYSM_BEACON_BINARY \
      --datadir=$NODE_DIR/consensus/beacondata \
      --min-sync-peers=$MIN_SYNC_PEERS \
      --genesis-state=$NODE_DIR/consensus/genesis.ssz \
      --bootstrap-node=$PRYSM_BOOTSTRAP_NODE \
      --interop-eth1data-votes \
      --chain-config-file=$NODE_DIR/consensus/config.yml \
      --contract-deployment-block=0 \
      --chain-id=${CHAIN_ID:-32382} \
      --rpc-host=127.0.0.1 \
      --rpc-port=$((PRYSM_BEACON_RPC_PORT + i)) \
      --grpc-gateway-host=127.0.0.1 \
      --grpc-gateway-port=$((PRYSM_BEACON_GRPC_GATEWAY_PORT + i)) \
      --execution-endpoint=http://localhost:$((GETH_AUTH_RPC_PORT + i)) \
      --accept-terms-of-use \
      --jwt-secret=$NODE_DIR/execution/jwtsecret \
      --suggested-fee-recipient=0x123463a4b065722e99115d6c222f267d9cabb524 \
      --minimum-peers-per-subnet=0 \
      --p2p-tcp-port=$((PRYSM_BEACON_P2P_TCP_PORT + i)) \
      --p2p-udp-port=$((PRYSM_BEACON_P2P_UDP_PORT + i)) \
      --monitoring-port=$((PRYSM_BEACON_MONITORING_PORT + i)) \
      --verbosity=info \
      --slasher \
      --enable-debug-rpc-endpoints > "$NODE_DIR/logs/beacon.log" 2>&1 &

    # Start prysm validator for this node. Each validator node will
    # manage 1 validator
    $PRYSM_VALIDATOR_BINARY \
      --beacon-rpc-provider=localhost:$((PRYSM_BEACON_RPC_PORT + i)) \
      --datadir=$NODE_DIR/consensus/validatordata \
      --accept-terms-of-use \
      --interop-num-validators=1 \
      --interop-start-index=$i \
      --rpc-port=$((PRYSM_VALIDATOR_RPC_PORT + i)) \
      --grpc-gateway-port=$((PRYSM_VALIDATOR_GRPC_GATEWAY_PORT + i)) \
      --monitoring-port=$((PRYSM_VALIDATOR_MONITORING_PORT + i)) \
      --graffiti="node-$i" \
      --chain-config-file=$NODE_DIR/consensus/config.yml > "$NODE_DIR/logs/validator.log" 2>&1 &

    # Check if the PRYSM_BOOTSTRAP_NODE variable is already set
    if [[ -z "${PRYSM_BOOTSTRAP_NODE}" ]]; then
        sleep 5 # sleep to let the prysm node set up
        # If PRYSM_BOOTSTRAP_NODE is not set, execute the command and capture the result into the variable
        # This allows subsequent nodes to discover the first node, treating it as the bootnode
        PRYSM_BOOTSTRAP_NODE=$(curl -s localhost:4100/eth/v1/node/identity | jq -r '.data.enr')
            # Check if the result starts with enr
        if [[ $PRYSM_BOOTSTRAP_NODE == enr* ]]; then
            echo "PRYSM_BOOTSTRAP_NODE is valid: $PRYSM_BOOTSTRAP_NODE"
        else
            echo "PRYSM_BOOTSTRAP_NODE does NOT start with enr"
            exit 1
        fi
    fi
    echo "PRYSM_BOOTSTRAP_NODE: $PRYSM_BOOTSTRAP_NODE"
done

# You might want to change this if you want to tail logs for other nodes
# Logs for all nodes can be found in `./network/node-*/logs`
tail -f "$NETWORK_DIR/node-0/logs/geth.log"
rzmahmood commented 5 months ago

I'll have some time to try and replicate next week. Let me know if u resolve this

sergey-msu commented 5 months ago

@rzmahmood thank you very much. I will be very appreciated

My fork of your repo is https://github.com/sergey-msu/ethereum-pos-testnet

here: net-init.sh - initialize clear private testnet net-run.sh - run the network

Supposed scenario:

  1. execute unit.sh script to build a new network
  2. execute run.sh script to raise the blockchain for the first time
  3. make a transaction, see it in console / Metamask wallet / etc
  4. shut down the run.sh script, execute it again, make a transaction, be sure it executed and validated successfully...
sergey-msu commented 5 months ago

Well, it seems this is unresolvable via some inner bug of Prysm https://github.com/OffchainLabs/eth-pos-devnet/issues/19

very strange issue

lzmrd commented 4 months ago

Hi there! did you fix this issue? I'm struggling trying to build a private network, but i'm having troubles to sync genesis state of geth and Prysm/Lighthouse

CoNETProject commented 2 months ago

Well, it seems this is unresolvable via some inner bug of Prysm OffchainLabs/eth-pos-devnet#19

very strange issue

It looks can't be restart a POS chain when it was shutdown. Looks like Beacon chain genesis a new block based on timestamp. If the new block is far away from previous block.

https://github.com/prysmaticlabs/prysm/issues/14042

CoNETProject commented 2 months ago

To do that you need to separate the script into two parts, the generation of the genesis and the starting of the nodes. You do the generating of the genesis once as that should only be done once at the start of the network.

Then you can start the nodes and they will continue from their last persisted state, building on the genesis. Here's an example of how someone else separated the generation of the genesis https://github.com/QiLOL/ethereum-pos-testnet/blob/main/gen-genesis.sh

Hello @rzmahmood,

Thank you for your awesome project. Your project helps me a lot.

Our POS chain was started based on your project. After several failures, we now have a healthy and secure POS chain.

CoNETProject commented 2 months ago

Hi there! did you fix this issue? I'm struggling trying to build a private network, but i'm having troubles to sync genesis state of geth and Prysm/Lighthouse

I think you can put peer to Prysm/Lighthouse to help sync.

$PRYSM_BEACON_BINARY \
    --datadir=$NODE_DIR/consensus/beacondata \
    --peer="/ip4/xxx.xxx.xxx.xxx/tcp/4200/p2p/xxxxxxxxxxxxxxxxxxxxxxxxxx" \
    --peer="/ip4/xxx.xxx.xxx.xxx/tcp/4200/p2p/xxxxxxxxxxxxxxxxxxxxxxxxxx" \
    .....
lzmrd commented 2 months ago

To do that you need to separate the script into two parts, the generation of the genesis and the starting of the nodes. You do the generating of the genesis once as that should only be done once at the start of the network. Then you can start the nodes and they will continue from their last persisted state, building on the genesis. Here's an example of how someone else separated the generation of the genesis https://github.com/QiLOL/ethereum-pos-testnet/blob/main/gen-genesis.sh

Hello @rzmahmood,

Thank you for your awesome project. Your project helps me a lot.

Our POS chain was started based on your project. After several failures, we now have a healthy and secure POS chain.

Hi @CoNETProject ! can you share your POS private network setup? are you running it on virtual or physical machines?

aniketpr01 commented 1 month ago

Hi @rzmahmood @sergey-msu is the issue resolved as I am also facing error while restoring the previous state, please share if any solution available.