Open Trisfald opened 4 months ago
@jancionear I believe you assigned me because I was working on hot-swap validator key (pytest/tests/sanity/validator_switch_key_quick.py). But this is entirely different logic.
@Trisfald are you still looking into this nayduck test? If not, I can take it.
Hey @staffik please feel free to have a look, I did try to dig into it a little bit, which brought me to #11751. The latter doesn't solve the root cause since it just avoid a crash. This's enough to make the test "pass" and to make a small network "work". But I'm afraid there might be more subtle issues hidden in there
@jancionear I believe you assigned me because I was working on hot-swap validator key (pytest/tests/sanity/validator_switch_key_quick.py). But this is entirely different logic.
Ah sorry I confused it with test_validator_switch_key_quick
If the failure happens within verify_chunk_endorsement, I suspect it might be because epoch_info isn't properly initialized for the current and next epoch after the key switch. This also explains why we are seeing the error only in the first two epochs and not after.
After #11751 the validator will fail to validate its own endorsements for 1 epoch, but it won't crash
A further improvement would be to initialize epoch_info (or the relevant struct) properly after node restart, so that own endorsements are always valid.
Describe the bug First, a recap of the test scenario:
The test fails because of a panic firing while processing chunk endorsements. For some reasons the node own endorsements aren't always 'valid' as previously thought. The validity check fails here.
I think the node could crash in production if something similar to the above happens. I'm afraid it might continue to crash until the next epoch kicks-in.
Now, the interesting part. The chunk endorsement is broken only for the first 2 epochs, starting from the 3 epoch everything is fine. So I guess it has something to do with the fact the validator key has been changed, and the change is not 'accounted for' immediately?
To Reproduce Run the test
validator_switch_key
, either on nayduck or locally. Example:python3 pytest/tests/sanity/validator_switch_key.py
Expected behavior The test passes. The node doesn't crash.
Screenshots None
Version (please complete the following information):
Additional context
Stacktrace
```bash 0: rust_begin_unwind at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:652:5 1: core::panicking::panic_fmt at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/panicking.rs:72:14 2: core::result::unwrap_failed at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1654:5 3: core::result::Result::run at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/task/mod.rs:400:9 32: tokio::task::local::LocalSet::tick::{{closure}} at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/task/local.rs:615:63 33: tokio::runtime::coop::with_budget at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/coop.rs:107:5 34: tokio::runtime::coop::budget at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/coop.rs:73:5 35: tokio::task::local::LocalSet::tick at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/task/local.rs:615:31 36: as core::future::future::Future>::poll::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/task/local.rs:927:16
37: tokio::task::local::LocalSet::with::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/task/local.rs:684:13
38: std::thread::local::LocalKey::try_with
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/thread/local.rs:286:12
39: std::thread::local::LocalKey::with
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/thread/local.rs:262:9
40: tokio::task::local::LocalSet::with
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/task/local.rs:667:9
41: as core::future::future::Future>::poll
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/task/local.rs:913:9
42: tokio::task::local::LocalSet::run_until::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/task/local.rs:573:19
43: as core::future::future::Future>::poll
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/future/future.rs:123:9
44: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/current_thread.rs:651:57
45: tokio::runtime::coop::with_budget
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/coop.rs:107:5
46: tokio::runtime::coop::budget
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/coop.rs:73:5
47: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/current_thread.rs:651:25
48: tokio::runtime::scheduler::current_thread::Context::enter
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/current_thread.rs:410:19
49: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/current_thread.rs:650:36
50: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/current_thread.rs:729:68
51: tokio::runtime::context::scoped::Scoped::set
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/context/scoped.rs:40:9
52: tokio::runtime::context::set_scheduler::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/context.rs:176:26
53: std::thread::local::LocalKey::try_with
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/thread/local.rs:286:12
54: std::thread::local::LocalKey::with
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/thread/local.rs:262:9
55: tokio::runtime::context::set_scheduler
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/context.rs:176:9
56: tokio::runtime::scheduler::current_thread::CoreGuard::enter
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/current_thread.rs:729:27
57: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/current_thread.rs:638:19
58: tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/current_thread.rs:175:28
59: tokio::runtime::context::runtime::enter_runtime
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/context/runtime.rs:65:16
60: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/scheduler/current_thread.rs:167:9
61: tokio::runtime::runtime::Runtime::block_on
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/runtime/runtime.rs:311:47
62: tokio::task::local::LocalSet::block_on
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.29.1/src/task/local.rs:534:9
63: actix_rt::runtime::Runtime::block_on
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/actix-rt-2.7.0/src/runtime.rs:80:9
64: actix_rt::arbiter::Arbiter::with_tokio_rt::{{closure}}
at /home/aspurio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/actix-rt-2.7.0/src/arbiter.rs:144:21
```