Closed parithosh closed 2 years ago
Alright so now tests are running, and it seems they are failing. I'll take a look see if I can fix those myself
@parithosh It seems the Beacon client in the Lighthouse client has trouble starting. At least we can't validate it's running at this line. I am not sure what's going on in there, would be curious if you have some ideas.
According the to container log, there's something wrong with the genesis state:
Oct 03 17:16:59.291 INFO Logging to file path: "/consensus-data/beacon/logs/beacon.log"
Oct 03 17:16:59.292 INFO Lighthouse started version: Lighthouse/v3.1.2-01e84b7
Oct 03 17:16:59.292 INFO Configured for network name: custom (/genesis/output)
Oct 03 17:16:59.292 INFO Data directory initialised datadir: /consensus-data
Oct 03 17:16:59.292 WARN Discv5 packet filter is disabled
Oct 03 17:16:59.292 INFO Deposit contract address: 0x4242424242424242424242424242424242424242, deploy_block: 0
Oct 03 17:16:59.317 INFO Starting from known genesis state service: beacon
Oct 03 17:16:59.318 CRIT Failed to start beacon node reason: Unable to parse genesis state SSZ: OffsetSkipsVariableBytes(2736633)
Oct 03 17:16:59.318 INFO Internal shutdown received reason: Failed to start beacon node
Oct 03 17:16:59.318 INFO Shutting down.. reason: Failure("Failed to start beacon node")
Interestingly, a very similar issue happens with other CL clients like nimbus. I tried running the module with only this client and here is what I get in its logs:
INF 2022-10-04 10:09:21.815+00:00 Launching beacon node topics="beacnde" version=v22.9.1-a84545-stateofus bls_backend=BLST cmdParams="@[\"--non-interactive=true\", \"--log-level=DEBUG\", \"--network=/genesis-data/output\", \"--data-dir=/root/consensus-data\", \"--web3-url=http://31.47.144.6:8551\", \"--nat=extip:31.47.144.8\", \"--enr-auto-update=false\", \"--rest\", \"--rest-address=0.0.0.0\", \"--rest-port=4000\", \"--validators-dir=/root/validator-keys\", \"--secrets-dir=/root/validator-secrets\", \"--doppelganger-detection=false\", \"--subscribe-all-subnets=true\", \"--num-threads=4\", \"--jwt-secret=/genesis-data/output/jwtsecret\", \"--metrics\", \"--metrics-address=0.0.0.0\", \"--metrics-port=8008\", \"--subscribe-all-subnets\"]" config="(configFile: None[InputFile], logLevel: \"DEBUG\", logStdout: auto, logFile: None[OutFile], eth2Network: Some(\"/genesis-data/output\"), dataDir: /root/consensus-data, validatorsDirFlag: Some(/root/validator-keys), secretsDirFlag: Some(/root/validator-secrets), walletsDirFlag: None[InputDir], eraDirFlag: None[InputDir], web3Urls: @[\"http://31.47.144.6:8551\"], web3ForcePolling: false, requireEngineAPI: None[bool], nonInteractive: true, netKeyFile: \"random\", netKeyInsecurePassword: false, agentString: \"nimbus\", subscribeAllSubnets: true, slashingDbKind: v2, numThreads: 4, jwtSecret: Some(\"/genesis-data/output/jwtsecret\"), cmd: noCommand, runAsServiceFlag: false, bootstrapNodes: @[], bootstrapNodesFile: , listenAddress: 0.0.0.0, tcpPort: 9000, udpPort: 9000, maxPeers: 160, hardMaxPeers: None[int], nat: (hasExtIp: true, extIp: 31.47.144.8), enrAutoUpdate: false, weakSubjectivityCheckpoint: None[Checkpoint], syncLightClient: false, trustedBlockRoot: None[Eth2Digest], finalizedCheckpointState: None[InputFile], finalizedCheckpointBlock: None[InputFile], nodeName: \"\", graffiti: None[GraffitiBytes], strictVerification: false, stopAtEpoch: 0, stopAtSyncedEpoch: 0, metricsEnabled: true, metricsAddress: 0.0.0.0, metricsPort: 8008, statusBarEnabled: true, statusBarContents: \"peers: $connected_peers;finalized: $finalized_root:$finalized_epoch;head: $head_root:$head_epoch:$head_epoch_slot;time: $epoch:$epoch_slot ($slot);sync: $sync_status|ETH: $attached_validators_balance\", rpcEnabled: None[bool], rpcPort: None[Port], rpcAddress: None[ValidIpAddress], restEnabled: true, restPort: 4000, restAddress: 0.0.0.0, restAllowedOrigin: None[TaintedString], restCacheSize: 3, restCacheTtl: 60, restRequestTimeout: 0, restMaxRequestBodySize: 16384, restMaxRequestHeadersSize: 64, keymanagerEnabled: false, keymanagerPort: 5052, keymanagerAddress: 127.0.0.1, keymanagerAllowedOrigin: None[TaintedString], keymanagerTokenFile: None[InputFile], lightClientDataServe: true, lightClientDataImportMode: only-new, lightClientDataMaxPeriods: None[uint64], inProcessValidators: true, debugForkChoice: false, discv5Enabled: true, dumpEnabled: false, directPeers: @[], doppelgangerDetection: false, syncHorizon: 50, terminalTotalDifficultyOverride: None[TaintedString], validatorMonitorAuto: false, validatorMonitorPubkeys: @[], validatorMonitorTotals: false, safeSlotsToImportOptimistically: None[uint16], suggestedFeeRecipient: None[Address], payloadBuilderEnable: false, payloadBuilderUrl: \"\")"
NOT 2022-10-04 10:09:21.824+00:00 Starting metrics HTTP server topics="beacnde" url=http://0.0.0.0:8008/metrics
INF 2022-10-04 10:09:21.852+00:00 Threadpool started topics="beacnde" numThreads=4
/root/nimbus-eth2/vendor/nim-testutils/testutils/moduletests.nim(21) moduletests
/root/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2163) main
/root/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(2031) handleStartUpCmd
/root/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(1848) doRunBeaconNode
/root/nimbus-eth2/beacon_chain/nimbus_beacon_node.nim(587) init
/root/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/assertions.nim(22) raiseAssert
/root/nimbus-eth2/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(49) sysFatal
Error: unhandled exception: Invalid baked-in state: SSZ BeaconState: object dynamic portion starts at invalid offset [AssertionError]
So, there's definitely something wrong in the genesis.ssz file. It seems we're using a third part lib to generate it (this one I believe?)
hmm, that's weird :/ I just tested it on my local system again and it worked perfectly fine. Could you grab the genesis data from the container and share it with me?
Yeah, we use an external dependency to generate the genesis data. But that's the same lib I'm using, so that shouldn't be an issue.
So the clients that support the merged genesis are: lighthouse, lodestar, teku. Nimbus and Prysm will error out for now, but will support it in the future. So our testing should be limited to these 3.
Oh okay, then it explains why nimbus is failing. Though lighthouse should be working, and it doesn't seem to be right now. I sent you on discord one genesis.ssz file I downloaded from a failing docker container, can you test it with your local setup and check if it's working? If it does work, it means something might be wrong with the versions of the clients we're using here. If it doesn't it means something is wrong we how we generate it
Hmm, this is super weird. I can't reproduce it at all :/
My kurtosis version: 0.49.9
My yaml file:
participants:
- elType: geth
elImage: ethereum/client-go:v1.10.25
clType: lighthouse
clImage: sigp/lighthouse:v3.1.2
network:
networkId: '3151908'
depositContractAddress: '0x4242424242424242424242424242424242424242'
secondsPerSlot: 12
slotsPerEpoch: 32
altairForkEpoch: 0
mergeForkEpoch: 0
totalTerminalDifficulty: 0
numValidatorKeysPerNode: 64
preregisteredValidatorKeysMnemonic: giant issue aisle success illegal bike spike
question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy
very lucky have athlete
waitForMining: false
waitForFinalization: true
waitForVerifications: true
verificationsEpochLimit: 5
logLevel: info
I've pushed my build as a docker file, there are no diffs to this branch.
kurtosis module exec --enclave-id eth2 parithoshj/kurtosis:merged-genesis-x86 --execute-params "$(cat ./merge.yaml)"
Give this a shot and let me know, i tested this on an M1 mac (my local dev machine) and a remote machine, it worked fine in both
Yup, this command seems to work better, but the tests now throw some error:
...
INFO[2022-10-06T08:38:19Z] Running synchronous testnet verification...
INFO[2022-10-06T09:09:03Z] Testnet verification has finished...
ERRO[2022-10-06T09:09:03Z] Some verifications were not successful
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Beacon Blocks Produced" client=Lighthouse clientID=0 pass=false extra="0 < 1"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Justified Epochs" client=Lighthouse clientID=0 pass=false extra="0 < 1"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Finalized Epochs" client=Lighthouse clientID=0 pass=false extra="0 < 2"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Epoch Attestation Performance" client=Lighthouse clientID=0 pass=false extra="0 < 85"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Epoch Target Attestation Performance" client=Lighthouse clientID=0 pass=false extra="0 < 85"
ERRO[2022-10-06T09:09:03Z] t=2022-10-06T09:09:03+0000 lvl=crit msg="Post-Merge Sync Participation Percentage" client=Lighthouse clientID=0 pass=false extra="0 < 85"
INFO[2022-10-06T11:09:03+02:00] --------------------- END MODULE LOGS --------------------
I'll try to understand what those tests do
Okay so I think I understand why the CI (and my previous manual run) were failing. The default params still sets altairForkEpoch
mergeForkEpoch
and totalTerminalDifficulty
to non zero value. For now I'll just push a small commit here to set those values to zero by default. Maybe we can chat about what to do here, specifically because besu still needs those right?
On this same note also worth chatting about the waiter you removed. Does besu still needs this as well then? Basically I'm inclined to either:
Update from our chat:
legacy/pre-merge
for posterityI just tested it again on my laptop, looks fine on latest head as well. So looks good to merge.