nimiq / core-rs-albatross

Rust implementation of the Albatross protocol
https://nimiq.com
Other
161 stars 63 forks source link

Unknown signers in signature: MultiSignature #64

Closed jgraef closed 3 years ago

jgraef commented 4 years ago

Seems like there is a race condition where we reset the ValidatorPool which is used by Handel as a IdentityRegistry. But the impl looks like this:

impl WeightRegistry for ValidatorRegistry {
    fn weight(&self, id: usize) -> Option<usize> {
        self.validators.read().get_public_key(id).map(|(_, weight)| weight)
    }
}

So in between calls the validator pool might change. Actually ValidatorPool::reset_epoch should do all changes in one go (while holding a write lock), but maybe not all validators got selected for the new epoch, so a validator ID is missing. Also if this wasn't the case, the IDs would still be messed up.

Solution

As a first fix, the impl for the IdentityRegistry can override the signers_weight implementation, so atleast all weights can be fetched with holding the lock only once.

In a second step we need to remove the possibility that an aggregation for epoch X queries the ValidatorPool which was reset to epoch X + 1. In the first implementation of IdentityRegistry it actually wasn't intended to have interior mutability. We can reimplement that again by copying all public keys and weights only when the IdentityRegistry is constructed. All necessary data should be available at this point anyway.

Notes

Log

 DEBUG consensus            | Now at block #32895
 DEBUG validator_network    | pBFT proposal by validator 11: 341b44023fc028f26b6beef4b8f01aadbc3e25b2f77a1a5a9ff59dfd31e9b8b8
 DEBUG consensus            | Now at block #32896
 DEBUG validator            | Setting validator to active: pk_idx=28
 WARN  pool                 | No validator info for: 8 (2 votes)
 ERROR panic                | thread 'tokio-runtime-worker-2' panicked at 'Unknown signers in signature: MultiSignature { signature: Signature(850d7f1bd3cef158bf45c7631db6820c0ad7801f83a9c7d178ccf8f027575225d652d8cb1c3091fb6279232c742d095f), signers: Bit
Set(29)[0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30] }': validator/src/signature_aggregation/voting.rs:171
stack backtrace:
   0: log_panics::init::{{closure}}
   1: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:468
   2: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:373
   3: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:328
   4: nimiq_validator::signature_aggregation::voting::VotingProtocol<T>::votes::{{closure}}::{{closure}}
   5: nimiq_validator::signature_aggregation::voting::VotingProtocol<T>::votes
   6: nimiq_validator::validator_network::ValidatorNetwork::on_pbft_commit_level_update
   7: <F as nimiq_utils::observer::PassThroughListener<E>>::on_event
   8: nimiq_validator::validator_agent::ValidatorAgent::on_pbft_commit_message
   9: <F as nimiq_utils::observer::PassThroughListener<E>>::on_event
  10: nimiq_messages::MessageNotifier::notify
  11: <F as nimiq_utils::observer::PassThroughListener<E>>::on_event
  12: nimiq_utils::observer::PassThroughNotifier<E>::notify
  13: <futures::stream::for_each::ForEach<S,F,U> as futures::future::Future>::poll
  14: futures::future::chain::Chain<A,B,C>::poll
  15: <futures::future::select::Select<A,B> as futures::future::Future>::poll
  16: <futures::future::map::Map<A,F> as futures::future::Future>::poll
  17: <futures::future::map_err::MapErr<A,F> as futures::future::Future>::poll
  18: futures::task_impl::Spawn<T>::poll_future_notify
  19: std::panicking::try::do_call
  20: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:79
  21: tokio_threadpool::task::Task::run
  22: tokio_threadpool::worker::Worker::run_task
  23: tokio_threadpool::worker::Worker::run
  24: std::thread::local::LocalKey<T>::with
  25: std::thread::local::LocalKey<T>::with
  26: tokio_reactor::with_default
  27: tokio::runtime::threadpool::builder::Builder::build::{{closure}}
  28: std::thread::local::LocalKey<T>::with
  29: std::thread::local::LocalKey<T>::with
  30: std::sys_common::backtrace::__rust_begin_short_backtrace
  31: std::panicking::try::do_call
  32: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:79
  33: core::ops::function::FnOnce::call_once{{vtable.shim}}
  34: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
  35: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/ac162c6abe34cdf965afc0389f6cefa79653c63b/src/liballoc/boxed.rs:942
      std::sys_common::thread::start_thread
             at src/libstd/sys_common/thread.rs:13
      std::sys::unix::thread::Thread::new::thread_start
             at src/libstd/sys/unix/thread.rs:79
  36: start_thread
  37: __clone

 DEBUG consensus            | Now at block #32897
 DEBUG sink                 | Closing connection, reason: SendFailed (Some("SendFailed"))
 DEBUG sink                 | Error closing connection: send failed because receiver is gone
 DEBUG consensus            | Now at block #32898
styppo commented 3 years ago

Fixed in new validator implementation