status-im / nimbus-eth2

Nim implementation of the Ethereum Beacon Chain
https://nimbus.guide
Other
516 stars 222 forks source link

Beacon node sync issues on Holesky testnet #6433

Closed jakubgs closed 1 month ago

jakubgs commented 1 month ago

Describe the bug Problem with beacon node syncing was identified starting on 2024/07/04 around 04:30:00:

image

Based on metrics this coincides with a version upgrade from 515bd4 to 85c285:

image

Unfortunately these changes include upgrade to Nim 2.0.8: https://github.com/status-im/nimbus-eth2/compare/515bd4...85c285

jakubgs commented 1 month ago

This issue appears to only affect unstable branch:

 > ansible nimbus.holesky -o -a 'curl -s 0:9303/eth/v1/node/syncing | jq -c ".data | { sync_distance }"' | sort 
erigon-01.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-02.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-03.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-04.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-05.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-06.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101599"}
erigon-07.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101597"}
erigon-08.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-09.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101593"}
erigon-10.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101587"}
geth-01.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-02.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-03.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-04.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-05.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-06.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101599"}
geth-07.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101600"}
geth-08.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101597"}
geth-09.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101595"}
geth-10.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-01.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-02.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-03.ih-eu-mda1.nimbus.holesky | {"sync_distance":"1"}
neth-04.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-05.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-06.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101597"}
neth-07.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101594"}
neth-08.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101595"}
neth-09.ih-eu-mda1.nimbus.holesky | {"sync_distance":"101588"}
neth-10.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}

Mostly nodes with high counts of validators.

jakubgs commented 1 month ago

Here's stable branch nodes for comparison:

 > ansible nimbus.holesky -o -a 'curl -s 0:9301/eth/v1/node/syncing | jq -c ".data | { sync_distance }"' | sort
erigon-01.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-02.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-03.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-04.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-05.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-06.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-07.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-08.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-09.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
erigon-10.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-01.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-02.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-03.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-04.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-05.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-06.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-07.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-08.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-09.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
geth-10.ih-eu-mda1.nimbus.holesky | {"sync_distance":"1"}
neth-01.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-02.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-03.ih-eu-mda1.nimbus.holesky | {"sync_distance":"1"}
neth-04.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-05.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-06.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-07.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-08.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-09.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}
neth-10.ih-eu-mda1.nimbus.holesky | {"sync_distance":"0"}

No issues whatsoever.

tersec commented 1 month ago
$ for i in $(seq -w 1 10); do echo -n $i ": "; ssh neth-$i.ih-eu-mda1.nimbus.holesky '/data/beacon-n
ode-holesky-unstable/bin/nimbus_beacon_node --version | grep Compiler'; done
01 : Nim Compiler Version 2.0.8 [Linux: amd64]
02 : Nim Compiler Version 2.0.8 [Linux: amd64]
03 : Nim Compiler Version 2.0.8 [Linux: amd64]
04 : Nim Compiler Version 2.0.8 [Linux: amd64]
05 : Nim Compiler Version 2.0.8 [Linux: amd64]
06 : Nim Compiler Version 2.0.3 [Linux: amd64]
07 : Nim Compiler Version 2.0.3 [Linux: amd64]
08 : Nim Compiler Version 2.0.3 [Linux: amd64]
09 : Nim Compiler Version 2.0.3 [Linux: amd64]
10 : Nim Compiler Version 2.0.8 [Linux: amd64]
$ for i in $(seq -w 1 10); do echo -n $i ": "; ssh erigon-$i.ih-eu-mda1.nimbus.holesky '/data/beacon-node-holesky-unstable/bin/nimbus_beacon_node --version | grep Compiler'; done
01 : Nim Compiler Version 2.0.8 [Linux: amd64]
02 : Nim Compiler Version 2.0.8 [Linux: amd64]
03 : Nim Compiler Version 2.0.8 [Linux: amd64]
04 : Nim Compiler Version 2.0.8 [Linux: amd64]
05 : Nim Compiler Version 2.0.8 [Linux: amd64]
06 : Nim Compiler Version 2.0.3 [Linux: amd64]
07 : Nim Compiler Version 2.0.3 [Linux: amd64]
08 : Nim Compiler Version 2.0.8 [Linux: amd64]
09 : Nim Compiler Version 2.0.3 [Linux: amd64]
10 : Nim Compiler Version 2.0.3 [Linux: amd64]
$ for i in $(seq -w 1 10); do echo -n $i ": "; ssh geth-$i.ih-eu-mda1.nimbus.holesky '/data/beacon-n
ode-holesky-unstable/bin/nimbus_beacon_node --version | grep Compiler'; done
01 : Nim Compiler Version 2.0.8 [Linux: amd64]
02 : Nim Compiler Version 2.0.8 [Linux: amd64]
03 : Nim Compiler Version 2.0.8 [Linux: amd64]
04 : Nim Compiler Version 2.0.8 [Linux: amd64]
05 : Nim Compiler Version 2.0.8 [Linux: amd64]
06 : Nim Compiler Version 2.0.3 [Linux: amd64]
07 : Nim Compiler Version 2.0.3 [Linux: amd64]
08 : Nim Compiler Version 2.0.3 [Linux: amd64]
09 : Nim Compiler Version 2.0.3 [Linux: amd64]
10 : Nim Compiler Version 2.0.8 [Linux: amd64]
user@epenomi:~/nimbus-eth2$ 

Nim 2.0.3 is not expected to work. The fix for this was in 2.0.8.

jakubgs commented 1 month ago

Indeed, well spotted, the cause was use of NIM_COMMIT=version-2-0 for the affected nodes which was not removed after the upgrade to Nim 2.0 compiler has taken place. Here's the fix:

Changes will be rolled out in the next hour or so.

etan-status commented 1 month ago

btw, recommending NIM_COMMIT=upstream/version-2-0 for these. sometime there are folks who push an outdated commit to origin/version-2-0 (the status fork) and if upstream/ is not specified explicitly it never bumps to the latest in tests.

tersec commented 1 month ago

Indeed, well spotted, the cause was use of NIM_COMMIT=version-2-0 for the affected nodes which was not removed after the upgrade to Nim 2.0 compiler has taken place. Here's the fix:

* [`infra-nimbus#b55f23af`](https://github.com/status-im/infra-nimbus/commit/b55f23af) - holesky: drop override of nim_commit to version-2-0

Changes will be rolled out in the next hour or so.

Have these changes been rolled out? Just checked and still:

$ for i in $(seq -w 1 10); do echo -n $i ": "; ssh geth-$i.ih-eu-mda1.nimbus.holesky '/data/beacon-node-holesky-unstable/bin/nimbus_beacon_node --version | grep Compiler'; done
01 : Nim Compiler Version 2.0.8 [Linux: amd64]
02 : Nim Compiler Version 2.0.8 [Linux: amd64]
03 : Nim Compiler Version 2.0.8 [Linux: amd64]
04 : Nim Compiler Version 2.0.8 [Linux: amd64]
05 : Nim Compiler Version 2.0.8 [Linux: amd64]
06 : Nim Compiler Version 2.0.3 [Linux: amd64]
07 : Nim Compiler Version 2.0.3 [Linux: amd64]
08 : Nim Compiler Version 2.0.3 [Linux: amd64]
09 : Nim Compiler Version 2.0.3 [Linux: amd64]
10 : Nim Compiler Version 2.0.8 [Linux: amd64]
$ for i in $(seq -w 1 10); do echo -n $i ": "; ssh neth-$i.ih-eu-mda1.nimbus.holesky '/data/beacon-node-holesky-unstable/bin/nimbus_beacon_node --version | grep Compiler'; done
01 : Nim Compiler Version 2.0.8 [Linux: amd64]
02 : Nim Compiler Version 2.0.8 [Linux: amd64]
03 : Nim Compiler Version 2.0.8 [Linux: amd64]
04 : Nim Compiler Version 2.0.8 [Linux: amd64]
05 : Nim Compiler Version 2.0.8 [Linux: amd64]
06 : Nim Compiler Version 2.0.8 [Linux: amd64]
07 : Nim Compiler Version 2.0.3 [Linux: amd64]
08 : Nim Compiler Version 2.0.3 [Linux: amd64]
09 : Nim Compiler Version 2.0.3 [Linux: amd64]
10 : Nim Compiler Version 2.0.8 [Linux: amd64]
$ for i in $(seq -w 1 10); do echo -n $i ": "; ssh erigon-$i.ih-eu-mda1.nimbus.holesky '/data/beacon-node-holesky-unstable/bin/nimbus_beacon_node --version | grep Compiler'; done
01 : Nim Compiler Version 2.0.8 [Linux: amd64]
02 : Nim Compiler Version 2.0.8 [Linux: amd64]
03 : Nim Compiler Version 2.0.8 [Linux: amd64]
04 : Nim Compiler Version 2.0.8 [Linux: amd64]
05 : Nim Compiler Version 2.0.8 [Linux: amd64]
06 : Nim Compiler Version 2.0.3 [Linux: amd64]
07 : Nim Compiler Version 2.0.3 [Linux: amd64]
08 : Nim Compiler Version 2.0.8 [Linux: amd64]
09 : Nim Compiler Version 2.0.3 [Linux: amd64]
10 : Nim Compiler Version 2.0.3 [Linux: amd64]
jakubgs commented 1 month ago

I believe this is now resolved.