qdrvm / kagome

Kagome - C++20 implementation of Polkadot Host
https://kagome.readthedocs.io
Apache License 2.0
160 stars 34 forks source link

[Bug]: Validator crashes on failure to obtain list of disabled validators on Kusama block #25830930 #2280

Open Lederstrumpf opened 4 days ago

Lederstrumpf commented 4 days ago

Bug Summary

Validator crashes instantly when processing block #25830930 (0x145a…36bc) on Kusama

Bug Description

Validator crashes instantly when processing block #25830930 (0x145a…36bc) on Kusama, presumably due to inability to obtain validator list for said block:

Nov 19 10:49:13 v1-oslo kagome[937259]: 24.11.19 10:49:13.373496  kagome           Critical  Babe  Can't obtain disabled validators list for block #25830930 (0x145a…36bc)

Steps to Reproduce

Mode: Validator number of nodes: 1 Command: kagome --chain kusama -d [...] --validator --listen-addr [...] --public-addr [...] --name [...] --rpc-port [...] --telemetry-url [...] --telemetry-url [...] --node-key-file [...]

Effects of the Bug

Instantly crashes, service restarts, hits same error again.

Example error log running with --validator:

Nov 19 10:49:11 v1-oslo systemd[1]: Started Start kusama node.
Nov 19 10:49:11 v1-oslo kagome[937259]: 24.11.19 10:49:11.692894  kagome           Info      Configuration  Parachain multi process: true
Nov 19 10:49:11 v1-oslo kagome[937259]: 24.11.19 10:49:11.692921  kagome           Info      Configuration  Listen addresses are set. The p2p port value would be ignored then.
Nov 19 10:49:11 v1-oslo kagome[937259]: 24.11.19 10:49:11.717656  kagome           Warning   chain_spec  Field 'chainType' was not specified in the chain spec. 'Live' by default.
Nov 19 10:49:11 v1-oslo kagome[937259]: 24.11.19 10:49:11.717668  kagome           Warning   chain_spec  A non-empty set of 'badBlocks' encountered! They might not be taken into account!
Nov 19 10:49:11 v1-oslo kagome[937259]: 24.11.19 10:49:11.726213  kagome           Info      Main  Kagome started. Version: 0.9.5-34-feature/nix-support-13-658dbf0d-dirty
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.102263  kagome           Info      Injector  Will use LibP2P keypair from config or 'node-key-file' CLI arg
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.103523  kagome           Info      BlockTree  Highest block: #25830930 (0x145a…36bc), Last finalized: #25830927 (0xd88f…cd2b)
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.103798  kagome           Info      PVF Executor  pvf runtime engine kWasmEdgeInterpreted
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.200893  kagome           Info      Application  Start as node version '0.9.5-34-feature/nix-support-13-658dbf0d-dirty' named as '
   Zugian Duck 🦆 Too' with PID 937259
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.200902  kagome           Info      Application  Chain path is /home/kusama/.local/share/kagome/chains/ksmcc3, storage backend is RocksDB
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.220974  kagome           Info      Timeline  Fast sync would be faster than Full sync that was selected
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.221005  kagome           Info      OpenMetrics  Listening for new connections on 0.0.0.0:9615
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.221247  statement-distr  Info      StatementDistribution  StatementDistribution subsystem started.
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.221259  main_runner      Info      RouterLibp2p  Started with peer id: 12D3KooWNfH4nKwB34P832idGQHVwXaGpAGtUptWcYMGCUkH2YDJ
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.221261  main_runner      Info      RouterLibp2p  Started listening on address: /ip4/0.0.0.0/tcp/30363/p2p/12D3KooWNfH4nKwB34P832idGQHVwXaGpAGtUptWcYMGCUkH2YDJ
Nov 19 10:49:12 v1-oslo kagome[937259]: 24.11.19 10:49:12.241417  beefy            Info      Beefy  last finalized 25830922
Nov 19 10:49:13 v1-oslo kagome[937259]: 24.11.19 10:49:13.134351  precompile.13    Warning   ModulePrecompiler  No validation code found for parachain 3355 with 'included' occupied assumption
Nov 19 10:49:13 v1-oslo kagome[937259]: 24.11.19 10:49:13.373496  kagome           Critical  Babe  Can't obtain disabled validators list for block #25830930 (0x145a…36bc)
Nov 19 10:49:13 v1-oslo kagome[937259]: terminate called after throwing an instance of 'boost::wrapexcept<std::system_error>'
Nov 19 10:49:13 v1-oslo kagome[937259]:   what():  Runtime module compilation failed
Nov 19 10:49:13 v1-oslo kagome[937259]: Stack trace (most recent call last):
Nov 19 10:49:13 v1-oslo kagome[937259]: #17   Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #16   Object "/home/shared/kagome", at 0x797df4, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #15   Object "/nix/store/dbcw19dshdwnxdv5q2g6wldj6syyvq7l-glibc-2.39-52/lib/libc.so.6", at 0x7f973b63d1c8, in __libc_start_main
Nov 19 10:49:13 v1-oslo kagome[937259]: #14   Object "/nix/store/dbcw19dshdwnxdv5q2g6wldj6syyvq7l-glibc-2.39-52/lib/libc.so.6", at 0x7f973b63d10d, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #13   Object "/home/shared/kagome", at 0x5ee268, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #12   Object "/home/shared/kagome", at 0x7a7cee, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #11   Object "/home/shared/kagome", at 0x56e6484, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #10   Object "/home/shared/kagome", at 0x56e6243, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #9    Object "/home/shared/kagome", at 0x5339b0a, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #8    Object "/home/shared/kagome", at 0x53652ba, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #7    Object "/home/shared/kagome", at 0x79b2ca, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #6    Object "/nix/store/p44qan69linp3ii0xrviypsw2j4qdcp2-gcc-13.2.0-lib/lib/libstdc++.so.6", at 0x7f973b8bc4d6, in __cxa_throw
Nov 19 10:49:13 v1-oslo kagome[937259]: #5    Object "/nix/store/p44qan69linp3ii0xrviypsw2j4qdcp2-gcc-13.2.0-lib/lib/libstdc++.so.6", at 0x7f973b8bc284, in std::terminate()
Nov 19 10:49:13 v1-oslo kagome[937259]: #4    Object "/nix/store/p44qan69linp3ii0xrviypsw2j4qdcp2-gcc-13.2.0-lib/lib/libstdc++.so.6", at 0x7f973b8bc219, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #3    Object "/nix/store/p44qan69linp3ii0xrviypsw2j4qdcp2-gcc-13.2.0-lib/lib/libstdc++.so.6", at 0x7f973b8acc0a, in
Nov 19 10:49:13 v1-oslo kagome[937259]: #2    Object "/nix/store/dbcw19dshdwnxdv5q2g6wldj6syyvq7l-glibc-2.39-52/lib/libc.so.6", at 0x7f973b63b934, in abort
Nov 19 10:49:13 v1-oslo kagome[937259]: #1    Object "/nix/store/dbcw19dshdwnxdv5q2g6wldj6syyvq7l-glibc-2.39-52/lib/libc.so.6", at 0x7f973b652e85, in gsignal
Nov 19 10:49:13 v1-oslo kagome[937259]: #0    Object "/nix/store/dbcw19dshdwnxdv5q2g6wldj6syyvq7l-glibc-2.39-52/lib/libc.so.6", at 0x7f973b6a2efc, in
Nov 19 10:49:13 v1-oslo kagome[937259]: Aborted (Signal sent by tkill() 937259 1001)
Nov 19 10:49:23 v1-oslo systemd[1]: kagosama.service: Main process exited, code=dumped, status=6/ABRT

If relaunching as non-validator, get following failure:

Nov 19 10:58:46 v1-oslo kagome[939623]: 24.11.19 10:58:46.045203  main_runner      Warning   AuthorityDiscoveryQuery  update: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:58:52 v1-oslo kagome[939623]: 24.11.19 10:58:52.028992  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:58:54 v1-oslo kagome[939623]: 24.11.19 10:58:54.435167  main_runner      Warning   AuthorityDiscoveryQuery  update: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:58:59 v1-oslo kagome[939623]: 24.11.19 10:58:59.302799  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:59:02 v1-oslo kagome[939623]: 24.11.19 10:59:02.654045  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:59:10 v1-oslo kagome[939623]: 24.11.19 10:59:10.793996  main_runner      Warning   AuthorityDiscoveryQuery  update: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:59:17 v1-oslo kagome[939623]: 24.11.19 10:59:17.420723  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:59:26 v1-oslo kagome[939623]: 24.11.19 10:59:26.440330  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:59:34 v1-oslo kagome[939623]: 24.11.19 10:59:34.160641  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:59:42 v1-oslo kagome[939623]: 24.11.19 10:59:42.009457  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 10:59:43 v1-oslo kagome[939623]: 24.11.19 10:59:43.255635  main_runner      Warning   AuthorityDiscoveryQuery  update: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:00:02 v1-oslo kagome[939623]: 24.11.19 11:00:02.015180  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:00:04 v1-oslo kagome[939623]: 24.11.19 11:00:04.202035  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:00:18 v1-oslo kagome[939623]: 24.11.19 11:00:18.202416  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:00:32 v1-oslo kagome[939623]: 24.11.19 11:00:32.202310  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:00:37 v1-oslo kagome[939623]: 24.11.19 11:00:37.438088  main_runner      Warning   SyncProtocolObserver  cannot find a requested block with id 22092206
Nov 19 11:00:37 v1-oslo kagome[939623]: 24.11.19 11:00:37.991485  main_runner      Warning   SyncProtocolObserver  cannot find a requested block with id 22092206
Nov 19 11:00:42 v1-oslo kagome[939623]: 24.11.19 11:00:42.326012  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:00:47 v1-oslo kagome[939623]: 24.11.19 11:00:47.757893  main_runner      Warning   AuthorityDiscoveryQuery  update: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:00:49 v1-oslo kagome[939623]: 24.11.19 11:00:49.666253  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:00:57 v1-oslo kagome[939623]: 24.11.19 11:00:57.268267  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:01:04 v1-oslo kagome[939623]: 24.11.19 11:01:04.891949  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:01:12 v1-oslo kagome[939623]: 24.11.19 11:01:12.084679  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:01:22 v1-oslo kagome[939623]: 24.11.19 11:01:22.918255  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:01:33 v1-oslo kagome[939623]: 24.11.19 11:01:33.145494  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:01:43 v1-oslo kagome[939623]: 24.11.19 11:01:43.587138  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:01:50 v1-oslo kagome[939623]: 24.11.19 11:01:50.772251  main_runner      Warning   Synchronizer  Block #25830931 (0x39c2…53b4) and 52 others have been discarded: N6kagome7runtime5ErrorE(1) Runtime module compilation failed
Nov 19 11:01:53 v1-oslo kagome[939623]: 24.11.19 11:01:53.737957  main_runner      Warning   SyncProtocolObserver  cannot find a requested block with id 18529500
Nov 19 11:01:54 v1-oslo kagome[939623]: 24.11.19 11:01:54.168132  main_runner      Warning   SyncProtocolObserver  cannot find a requested block with id 18529500

Expected Behavior

Doesn't crash on said block. Non-validator doesn't crash but stalls on block #25830930's child (see logs below).

System Information

NixOS 24.5 with kernel 6.11.5 Compiler: gcc 13.2.0 CMake: cmake version 3.25.3

Additional Context

No response

kamilsa commented 4 days ago

@Lederstrumpf

Thanks for reporting! Seems like the issue is related to KAGOME not being able to find runtime for some parachain. Until we figured out why is that the case, can you check if https://github.com/qdrvm/kagome/pull/2281 helps?

Lederstrumpf commented 4 days ago

@Lederstrumpf

Thanks for reporting! Seems like the issue is related to KAGOME not being able to find runtime for some parachain. Until we figured out why is that the case, can you check if #2281 helps?

Thanks for the swift attention to the issue!

Until we figured out why is that the case, can you check if #2281 helps?

Unfortunately, get the same error on #2281 after rebuilding db (from #2271 - #2278 wasn't built on it).

Attached are logs both for validator & non-validator modes. kagomevalidator#2281.log kagome_nonvalidator#2281.log

kamilsa commented 3 days ago

@Lederstrumpf we are investigating the issue. Meanwhile, you may try to sync the node with Warp or Fast sync. For some of our devs problem is not reproducing after they synced to the end of the chain. If it does not help, you may try to Warp sync from genesis

Lederstrumpf commented 2 days ago

@Lederstrumpf we are investigating the issue. Meanwhile, you may try to sync the node with Warp or Fast sync. For some of our devs problem is not reproducing after they synced to the end of the chain. If it does not help, you may try to Warp sync from genesis

Thanks, fwiw, the Full sync from my backup db from 14 days ago (mentioned here: https://github.com/qdrvm/kagome/issues/2276#issuecomment-2485838632) failed today on the same block/error.

Just restarted with a Warp sync from genesis - will update on its result once completed.

Lederstrumpf commented 2 days ago

Just restarted with a Warp sync from genesis - will update on its result once completed.

Unfortunately also fails: strictly on the successor of the warp sync target block, both for validator and non-validator modes:

Nov 21 10:23:10 v1-oslo kagome[993246]: 24.11.21 10:23:10.547760  main_runner      Warning   Synchronizer  Block #25874041 (0x521d…627b) and 104 others have been discarded: N6kagome10blockchain14BlockTreeErrorE(9) the requested block header is not found in block storage
Nov 21 10:23:35 v1-oslo kagome[993246]: 24.11.21 10:23:35.084349  main_runner      Warning   Synchronizer  Block #25874041 (0x521d…627b) and 108 others have been discarded: N6kagome10blockchain14BlockTreeErrorE(9) the requested block header is not found in block storage
Nov 21 10:23:38 v1-oslo kagome[993246]: 24.11.21 10:23:38.752177  main_runner      Warning   Synchronizer  Block #25874041 (0x521d…627b) and 109 others have been discarded: N6kagome10blockchain14BlockTreeErrorE(9) the requested block header is not found in block storage

I've attached some logs with the initial sync and restarts of the service: on every node restart (with --sync Warp still specified), the node manages to sync up until the current sync target and then stalls the same. The last log (kagome_warp_sync_restart_4_validator_#2280_trace.log) is trace level.

kamilsa commented 2 days ago

Sorry, I couldn't find kagome_warp_sync_restart_4_validator_#2280_trace.log log.

Also, btw the issue from kagomevalidator#2281.log kagome_nonvalidator#2281.log

could have been resolved in the master by #2285 . You may try to use master branch to check if it moves forward. (Please note we recently updated the database format, you might require --enable-db-migration flag if you run with existing database for the first time)

Lederstrumpf commented 2 days ago

Sorry, I couldn't find kagome_warp_sync_restart_4_validator_#2280_trace.log log.

Also, btw the issue from kagomevalidator#2281.log kagome_nonvalidator#2281.log

could have been resolved in the master by #2285 . You may try to use master branch to check if it moves forward. (Please note we recently updated the database format, you might require --enable-db-migration flag if you run with existing database for the first time)

Ah, you're right I forgot to attach - sorry about that, here they are: kagome_warp_sync_restartnonvalidator#2280.log kagome_warp_sync_restart_4validator#2280_trace.log kagome_warp_sync_restart_3validator#2280.log kagome_warp_sync_restart_2nonvalidator#2280.log kagome_warp_syncnonvalidator#2280.log

You may try to use master branch to check if it moves forward. (Please note we recently updated the database format, you might require --enable-db-migration flag if you run with existing database for the first time)

Will give it a shot - I didn't have #2285 in yet. DB's already migrated though.

Lederstrumpf commented 2 days ago

You may try to use master branch to check if it moves forward. (Please note we recently updated the database format, you might require --enable-db-migration flag if you run with existing database for the first time)

Will give it a shot - I didn't have #2285 in yet. DB's already migrated though.

Unfortunately still fails for me with #2285 included; even on a fresh Warp sync, see attached logs. It might be that Amforc's "The Spammening" happening atm is a factor.

kagome_warp_sync_fresh_6validator#2280.log kagome_warp_sync_restart_7validator#2280.log

kamilsa commented 1 day ago

I couldn't reproduce the issue exactly. I suspect there might be some undefined behaviour either with 1) wabt instrumentation or 2) wasmedge . To check 1) please try to use tmp/disable-wabt-instrumentation branch.

To check 2) please try to rebuild KAGOME with -DWASM_COMPILER=WAVM flag during cmake configuration and pass --wasm-execution Compiled flag to use Wavm instead of WasmEdge