lukso-network / tools-lukso-cli

This repository serves as a CLI to download and/or run lukso dependencies
https://docs.lukso.tech/networks/mainnet/running-a-node/#lukso-cli-node-setup
Apache License 2.0
12 stars 6 forks source link

Dencun hardfork post mortem and resulting issues for client diversification #265

Open xtc0r opened 1 day ago

xtc0r commented 1 day ago

After the issues yesterday I want to write up some things I recognized. Some of them should be transformed into tasks IMHO. I was on erigon / LH. Here I can share my personal experience. Both clients were updated to the correct version for the HF and network configs were also updated.

1. Erigon did not recognize the hard fork and the /mainnet-data/execution folder had to be deleted

After HF block but before deleting execution folder erigon log: Erigon is not ware of Cancun [INFO] [11-20|16:47:53.986] Initialised chain configuration config="{ChainID: 42, Homestead: 0, DAO: <nil>, Tangerine Whistle: 0, Spurious Dragon: 0, Byzantium: 0, Constantinople: 0, Petersburg: 0, Istanbul: 0, Muir Glacier: <nil>, Berlin: 0, London: 0, Arrow Glacier: <nil>, Gray Glacier: <nil>, Terminal Total Difficulty: 0, Merge Netsplit: <nil>, Shanghai: 1687969198, Cancun: <nil>, Prague: <nil>, Osaka: <nil>, Engine: unknown, NoPruneContracts: map[]}" genesis=0x5df88817dfb9b00d8ef142370671e8a9bc00c548ab78fbaf205df53db2b24a26 [INFO] [11-20|16:47:53.993] Initialising Ethereum protocol network=42

After HF block AND after deleting execution folder: Erigon is aware of Cancun:

INFO] [11-20|17:22:07.759] Initialised chain configuration config="{ChainID: 42, Homestead: 0, DAO: <nil>, Tangerine Whistle: 0, Spurious Dragon: 0, Byzantium: 0, Constantinople: 0, Petersburg: 0, Istanbul: 0, Muir Glacier: <nil>, Berlin: 0, London: 0, Arrow Glacier: <nil>, Gray Glacier: <nil>, Terminal Total Difficulty: 0, Merge Netsplit: <nil>, Shanghai: 1687969198, Cancun: 1732119595, Prague: <nil>, Osaka: <nil>, Engine: ethash, NoPruneContracts: map[]}" genesis=0x5df88817dfb9b00d8ef142370671e8a9bc00c548ab78fbaf205df53db2b24a26

Task/Question: Why was this not seen on testnet? Is this a Erigon bug (maybe fixed in later Erigon versions we decided to not use on Lukso)?

2. Erigon was downloading and processing block well till 1.6M block. Then it dropped from processing 4000blk/s to 7blk/s. Thats was not only for me, it seems a lot people have this issue. So most of us changed to another client, because it would have taken days to sync.

[INFO] [11-20|18:55:54.130] [4/12 Execution] Executed blocks number=1658134 blk/s=6.9 tx/s=16.3 Mgas/s=150.0 gasState=0.28 batch=82.5MB alloc=571.6MB sys=8.2GB

Result: Client diversity weakened (in favor of geth, why geth? read on) Task/Question: How to speed up Erigon sync make it an valid option again. I guess on Ethereum Erigon syncs faster, maybe they use the torrent download feature?

3. CLI download link for nethermind arm not working. See other issue I created.

Result: Nethermind not an option --> Impact on client diversification Tasks:

  1. Fix Lukso-cli install for nethermind arm
  2. Improve testing to catch these things

4. Teku / Besu issues (stakingverse)

Stakingverse is responsible for 33%? of the network validators. It runs most/all nodes with besu/teku. Relying on one one execution and one consensus client. It improved client diversification but also bares high risk for the whole network. This risk just materialised when teku failed for stakingverse. Quick fix: switched to Lighthouse (LH)

Result:

  1. No finality as a huge part of the network did not follow the planned hard fork
  2. Huge impact on client diversification as they switched to LH Tasks:
  3. Investigate why Teku failed
  4. Figure out why it was not detected on testnet
  5. Plan to improve client diversification again

Summary:

I think client diversification got a huge hit towards geth / LH. Some of the reasons I explained above. It will be hard to turn this around again. It depends on stakingverse for consensus and to the community to switch away from geth. But here geth proofed just to work. Also we should improve on testing and maybe consider staying closer to latest client releases.

blazejkrzak commented 1 day ago

I very appreciate this post mortem. Great work, we need more contributions like yours!