kurtosis-tech / zzz-deprecated-eth2-merge-kurtosis-module

Deprecated in favor of https://github.com/kurtosis-tech/eth2-package
29 stars 23 forks source link

[FEAT]Switching to merged genesis #137

Closed parithosh closed 2 years ago

parithosh commented 2 years ago
gbouv commented 2 years ago

Alright so now tests are running, and it seems they are failing. I'll take a look see if I can fix those myself

gbouv commented 2 years ago

@parithosh It seems the Beacon client in the Lighthouse client has trouble starting. At least we can't validate it's running at this line. I am not sure what's going on in there, would be curious if you have some ideas.

According the to container log, there's something wrong with the genesis state:

Oct 03 17:16:59.291 INFO Logging to file                         path: "/consensus-data/beacon/logs/beacon.log"
Oct 03 17:16:59.292 INFO Lighthouse started                      version: Lighthouse/v3.1.2-01e84b7
Oct 03 17:16:59.292 INFO Configured for network                  name: custom (/genesis/output)
Oct 03 17:16:59.292 INFO Data directory initialised              datadir: /consensus-data
Oct 03 17:16:59.292 WARN Discv5 packet filter is disabled
Oct 03 17:16:59.292 INFO Deposit contract                        address: 0x4242424242424242424242424242424242424242, deploy_block: 0
Oct 03 17:16:59.317 INFO Starting from known genesis state       service: beacon
Oct 03 17:16:59.318 CRIT Failed to start beacon node             reason: Unable to parse genesis state SSZ: OffsetSkipsVariableBytes(2736633)
Oct 03 17:16:59.318 INFO Internal shutdown received              reason: Failed to start beacon node
Oct 03 17:16:59.318 INFO Shutting down..                         reason: Failure("Failed to start beacon node")
gbouv commented 2 years ago

Interestingly, a very similar issue happens with other CL clients like nimbus. I tried running the module with only this client and here is what I get in its logs:

INF 2022-10-04 10:09:21.815+00:00 Launching beacon node                      topics="beacnde" version=v22.9.1-a84545-stateofus bls_backend=BLST cmdParams="@[\"--non-interactive=true\", \"--log-level=DEBUG\", \"--network=/genesis-data/output\", \"--data-dir=/root/consensus-data\", \"--web3-url=http://31.47.144.6:8551\", \"--nat=extip:31.47.144.8\", \"--enr-auto-update=false\", \"--rest\", \"--rest-address=0.0.0.0\", \"--rest-port=4000\", \"--validators-dir=/root/validator-keys\", \"--secrets-dir=/root/validator-secrets\", \"--doppelganger-detection=false\", \"--subscribe-all-subnets=true\", \"--num-threads=4\", \"--jwt-secret=/genesis-data/output/jwtsecret\", \"--metrics\", \"--metrics-address=0.0.0.0\", \"--metrics-port=8008\", \"--subscribe-all-subnets\"]" config="(configFile: None[InputFile], logLevel: \"DEBUG\", logStdout: auto, logFile: None[OutFile], eth2Network: Some(\"/genesis-data/output\"), dataDir: /root/consensus-data, validatorsDirFlag: Some(/root/validator-keys), secretsDirFlag: Some(/root/validator-secrets), walletsDirFlag: None[InputDir], eraDirFlag: None[InputDir], web3Urls: @[\"http://31.47.144.6:8551\"], web3ForcePolling: false, requireEngineAPI: None[bool], nonInteractive: true, netKeyFile: \"random\", netKeyInsecurePassword: false, agentString: \"nimbus\", subscribeAllSubnets: true, slashingDbKind: v2, numThreads: 4, jwtSecret: Some(\"/genesis-data/output/jwtsecret\"), cmd: noCommand, runAsServiceFlag: false, bootstrapNodes: @[], bootstrapNodesFile: , listenAddress: 0.0.0.0, tcpPort: 9000, udpPort: 9000, maxPeers: 160, hardMaxPeers: None[int], nat: (hasExtIp: true, extIp: 31.47.144.8), enrAutoUpdate: false, weakSubjectivityCheckpoint: None[Checkpoint], syncLightClient: false, trustedBlockRoot: None[Eth2Digest], finalizedCheckpointState: None[InputFile], finalizedCheckpointBlock: None[InputFile], nodeName: \"\", graffiti: None[GraffitiBytes], strictVerification: false, stopAtEpoch: 0, stopAtSyncedEpoch: 0, metricsEnabled: true, metricsAddress: 0.0.0.0, metricsPort: 8008, statusBarEnabled: true, statusBarContents: \"peers: $connected_peers;finalized: $finalized_root:$finalized_epoch;head: $head_root:$head_epoch:$head_epoch_slot;time: $epoch:$epoch_slot ($slot);sync: $sync_status|ETH: $attached_validators_balance\", rpcEnabled: None[bool], rpcPort: None[Port], rpcAddress: None[ValidIpAddress], restEnabled: true, restPort: 4000, restAddress: 0.0.0.0, restAllowedOrigin: None[TaintedString], restCacheSize: 3, restCacheTtl: 60, restRequestTimeout: 0, restMaxRequestBodySize: 16384, restMaxRequestHeadersSize: 64, keymanagerEnabled: false, keymanagerPort: 5052, keymanagerAddress: 127.0.0.1, keymanagerAllowedOrigin: None[TaintedString], keymanagerTokenFile: None[InputFile], lightClientDataServe: true, lightClientDataImportMode: only-new, lightClientDataMaxPeriods: None[uint64], inProcessValidators: true, debugForkChoice: false, discv5Enabled: true, dumpEnabled: false, directPeers: @[], doppelgangerDetection: false, syncHorizon: 50, terminalTotalDifficultyOverride: None[TaintedString], validatorMonitorAuto: false, validatorMonitorPubkeys: @[], validatorMonitorTotals: false, safeSlotsToImportOptimistically: None[uint16], suggestedFeeRecipient: None[Address], payloadBuilderEnable: false, payloadBuilderUrl: \"\")"
NOT 2022-10-04 10:09:21.824+00:00 Starting metrics HTTP server               topics="beacnde" url=http://0.0.0.0:8008/metrics
INF 2022-10-04 10:09:21.852+00:00 Threadpool started                         topics="beacnde" numThreads=4
/root/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
/root/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2163) main
/root/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2031) handleStartUpCmd
/root/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1848) doRunBeaconNode
/root/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(587) init
/root/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/assertions.nim(22) raiseAssert
/root/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(49) sysFatal
Error: unhandled exception: Invalid baked-in state: SSZ BeaconState: object dynamic portion starts at invalid offset [AssertionError]

So, there's definitely something wrong in the genesis.ssz file. It seems we're using a third part lib to generate it (this one I believe?)

parithosh commented 2 years ago

hmm, that's weird :/ I just tested it on my local system again and it worked perfectly fine. Could you grab the genesis data from the container and share it with me?

Yeah, we use an external dependency to generate the genesis data. But that's the same lib I'm using, so that shouldn't be an issue.

parithosh commented 2 years ago

So the clients that support the merged genesis are: lighthouse, lodestar, teku. Nimbus and Prysm will error out for now, but will support it in the future. So our testing should be limited to these 3.

gbouv commented 2 years ago

Oh okay, then it explains why nimbus is failing. Though lighthouse should be working, and it doesn't seem to be right now. I sent you on discord one genesis.ssz file I downloaded from a failing docker container, can you test it with your local setup and check if it's working? If it does work, it means something might be wrong with the versions of the clients we're using here. If it doesn't it means something is wrong we how we generate it

parithosh commented 2 years ago

Hmm, this is super weird. I can't reproduce it at all :/

My kurtosis version: 0.49.9

My yaml file:

participants:
  - elType: geth
    elImage: ethereum/client-go:v1.10.25
    clType: lighthouse
    clImage: sigp/lighthouse:v3.1.2
network:
  networkId: '3151908'
  depositContractAddress: '0x4242424242424242424242424242424242424242'
  secondsPerSlot: 12
  slotsPerEpoch: 32
  altairForkEpoch: 0
  mergeForkEpoch: 0
  totalTerminalDifficulty: 0
  numValidatorKeysPerNode: 64
  preregisteredValidatorKeysMnemonic: giant issue aisle success illegal bike spike
    question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy
    very lucky have athlete
waitForMining: false
waitForFinalization: true
waitForVerifications: true
verificationsEpochLimit: 5
logLevel: info

I've pushed my build as a docker file, there are no diffs to this branch.

kurtosis module exec --enclave-id eth2 parithoshj/kurtosis:merged-genesis-x86 --execute-params "$(cat ./merge.yaml)"

Give this a shot and let me know, i tested this on an M1 mac (my local dev machine) and a remote machine, it worked fine in both

gbouv commented 2 years ago

Yup, this command seems to work better, but the tests now throw some error:

...
INFO[2022-10-06T08:38:19Z] Running synchronous testnet verification...
INFO[2022-10-06T09:09:03Z] Testnet verification has finished...
ERRO[2022-10-06T09:09:03Z] Some verifications were not successful
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Beacon Blocks Produced" client=Lighthouse clientID=0 pass=false extra="0 < 1"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Justified Epochs" client=Lighthouse clientID=0 pass=false extra="0 < 1"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Finalized Epochs" client=Lighthouse clientID=0 pass=false extra="0 < 2"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Epoch Attestation Performance" client=Lighthouse clientID=0 pass=false extra="0 < 85"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Epoch Target Attestation Performance" client=Lighthouse clientID=0 pass=false extra="0 < 85"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Sync Participation Percentage" client=Lighthouse clientID=0 pass=false extra="0 < 85"
INFO[2022-10-06T11:09:03+02:00] --------------------- END MODULE LOGS --------------------

I'll try to understand what those tests do

gbouv commented 2 years ago

Okay so I think I understand why the CI (and my previous manual run) were failing. The default params still sets altairForkEpoch mergeForkEpoch and totalTerminalDifficulty to non zero value. For now I'll just push a small commit here to set those values to zero by default. Maybe we can chat about what to do here, specifically because besu still needs those right? On this same note also worth chatting about the waiter you removed. Does besu still needs this as well then? Basically I'm inclined to either:

gbouv commented 2 years ago

Update from our chat:

parithosh commented 2 years ago

I just tested it again on my laptop, looks fine on latest head as well. So looks good to merge.