paritytech / polkadot-introspector

A collection of tools focused on debugging and monitoring the relay chain and parachain progress from a 🐦-view
GNU General Public License v3.0
37 stars 3 forks source link

Application panicked (crashed) when RPC nodes are unreachable #24

Open lazam opened 2 years ago

lazam commented 2 years ago

The application panicked (crashed) when some of the RPC nodes are unreachable.

Logs:

[2022-03-11T04:21:51.869184Z WARN  rustls::check] Received a ServerHelloDone handshake message while expecting [CertificateRequest]
[2022-03-11T04:21:54.206332Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
[2022-03-11T04:21:56.279044Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
[2022-03-11T04:22:00.395583Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
[2022-03-11T04:22:02.807649Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
[2022-03-11T04:22:02.807750Z ERROR polkadot_introspector::core::subxt] [wss://versi-tick-rpc.polkadot.io:443] Disconnected (Rpc(Transport(Connection timeout exceeded: 10s))) 
[2022-03-11T04:22:03.045872Z WARN  rustls::check] Received a ServerHelloDone handshake message while expecting [CertificateRequest]
[2022-03-11T04:22:04.470119Z ERROR polkadot_introspector::core::subxt] subxt call error: Metadata(PalletNotFound("Timestamp"))
[2022-03-11T04:22:04.794492Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
[2022-03-11T04:22:08.036180Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
[2022-03-11T04:22:09.950028Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
[2022-03-11T04:22:09.950232Z WARN  rustls::check] Received a ServerHelloDone handshake message while expecting [CertificateRequest]
[2022-03-11T04:22:11.994402Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
[2022-03-11T04:22:15.486887Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
[2022-03-11T04:22:15.486984Z ERROR polkadot_introspector::core::subxt] [wss://versi-tick-rpc.polkadot.io:443] Disconnected (Rpc(Transport(Connection timeout exceeded: 10s))) 
[2022-03-11T04:22:16.428480Z ERROR polkadot_introspector::core::subxt] subxt call error: Metadata(PalletNotFound("Timestamp"))
The application panicked (crashed).
Message:  Request timed out: Request { url: "wss://rococo-statemint-rpc.polkadot.io:443", request_type: GetBlockTimestamp(Some(0xa11aaa91fc27c0c9e70af3afa55861904383317cca730b7bebc3869c6b0786a4)), response_sender: Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: true, is_tx_task_set: false } }) } }
Location: src/core/subxt.rs:157

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                                ⋮ 7 frames hidden ⋮                               
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h93480fe48528e9e4
      at <unknown source file>:<unknown line>
   9: tokio::runtime::task::harness::Harness<T,S>::poll::h431ca52fb4b6e639
      at <unknown source file>:<unknown line>
  10: std::thread::local::LocalKey<T>::with::h3295764c141dc3ef
      at <unknown source file>:<unknown line>
  11: tokio::runtime::thread_pool::worker::Context::run_task::hb41ccf2538030ca1
      at <unknown source file>:<unknown line>
  12: tokio::runtime::thread_pool::worker::Context::run::h3053b95b664635d3
      at <unknown source file>:<unknown line>
  13: tokio::macros::scoped_tls::ScopedKey<T>::set::h408cc0b29da9105a
      at <unknown source file>:<unknown line>
  14: tokio::runtime::thread_pool::worker::run::h1ea665436ab3a822
      at <unknown source file>:<unknown line>
  15: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h29cbf04e8f43737f
      at <unknown source file>:<unknown line>
  16: std::panicking::try::h0aa4caf77b9db292
      at <unknown source file>:<unknown line>
  17: tokio::runtime::task::harness::Harness<T,S>::poll::h8a7121f28e3d21de
      at <unknown source file>:<unknown line>
  18: tokio::runtime::blocking::pool::Inner::run::h7d0364598c1b0208
      at <unknown source file>:<unknown line>
  19: std::sys_common::backtrace::__rust_begin_short_backtrace::h5d885a40af7ba8e4
      at <unknown source file>:<unknown line>
  20: core::ops::function::FnOnce::call_once{{vtable.shim}}::haa9c6ae36f70def5
      at <unknown source file>:<unknown line>
  21: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h7bd677a5dc988be6
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/alloc/src/boxed.rs:1691
  22: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h7b1c1ba11c4db785
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/alloc/src/boxed.rs:1691
  23: std::sys::unix::thread::Thread::new::thread_start::h9c58c0d12d84e854
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys/unix/thread.rs:106
  24: start_thread<unknown>
      at <unknown source file>:<unknown line>
  25: clone<unknown>
      at <unknown source file>:<unknown line>

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
The application panicked (crashed).
Message:  Expected GetBlockTimestampResponse, got something else.
Location: src/core/api.rs:45

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                                ⋮ 7 frames hidden ⋮                               
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h35bffb1d27d20624
      at <unknown source file>:<unknown line>
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h31e01134c8d1a41a
      at <unknown source file>:<unknown line>
  10: tokio::runtime::task::harness::Harness<T,S>::poll::h0e3c68e149302473
      at <unknown source file>:<unknown line>
  11: std::thread::local::LocalKey<T>::with::h3295764c141dc3ef
      at <unknown source file>:<unknown line>
  12: tokio::runtime::thread_pool::worker::Context::run_task::hb41ccf2538030ca1
      at <unknown source file>:<unknown line>
  13: tokio::runtime::thread_pool::worker::Context::run::h3053b95b664635d3
      at <unknown source file>:<unknown line>
  14: tokio::macros::scoped_tls::ScopedKey<T>::set::h408cc0b29da9105a
      at <unknown source file>:<unknown line>
  15: tokio::runtime::thread_pool::worker::run::h1ea665436ab3a822
      at <unknown source file>:<unknown line>
  16: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h29cbf04e8f43737f
      at <unknown source file>:<unknown line>
  17: std::panicking::try::h0aa4caf77b9db292
      at <unknown source file>:<unknown line>
  18: tokio::runtime::task::harness::Harness<T,S>::poll::h8a7121f28e3d21de
      at <unknown source file>:<unknown line>
  19: tokio::runtime::blocking::pool::Inner::run::h7d0364598c1b0208
      at <unknown source file>:<unknown line>
  20: std::sys_common::backtrace::__rust_begin_short_backtrace::h5d885a40af7ba8e4
      at <unknown source file>:<unknown line>
  21: core::ops::function::FnOnce::call_once{{vtable.shim}}::haa9c6ae36f70def5
      at <unknown source file>:<unknown line>
  22: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h7bd677a5dc988be6
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/alloc/src/boxed.rs:1691
  23: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h7b1c1ba11c4db785
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/alloc/src/boxed.rs:1691
  24: std::sys::unix::thread::Thread::new::thread_start::h9c58c0d12d84e854
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/sys/unix/thread.rs:106
  25: start_thread<unknown>
      at <unknown source file>:<unknown line>
  26: clone<unknown>
      at <unknown source file>:<unknown line>

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
[2022-03-11T04:22:16.906702Z ERROR polkadot_introspector::core::subxt] [wss://rococo-canvas-rpc.polkadot.io.io:443] Disconnected (Rpc(Transport(Failed to resolve IP addresses for this hostname: failed to lookup address information: Temporary failure in name resolution))) 
Error: 
   0: panic

Location:
   /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/result.rs:1914

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                                ⋮ 2 frames hidden ⋮                               
   3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h243de022c2fb1fb2
      at <unknown source file>:<unknown line>
   4: polkadot_introspector::main::{{closure}}::haa7179a574293364
      at <unknown source file>:<unknown line>
   5: tokio::park::thread::CachedParkThread::block_on::h41dec6d8c00dd82d
      at <unknown source file>:<unknown line>
   6: tokio::runtime::thread_pool::ThreadPool::block_on::hb517c83cedba7e35
      at <unknown source file>:<unknown line>
   7: tokio::runtime::Runtime::block_on::h6704d75a74637c4f
      at <unknown source file>:<unknown line>
   8: polkadot_introspector::main::hd43db547d28854a3
      at <unknown source file>:<unknown line>
   9: std::sys_common::backtrace::__rust_begin_short_backtrace::hcdf14ca10d077a1a
      at <unknown source file>:<unknown line>
  10: std::rt::lang_start::{{closure}}::hdfaf95dcab8b9283
      at <unknown source file>:<unknown line>
  11: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hc56adab7a77ec6e3
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:259
  12: std::panicking::try::do_call::h29f013120c5abc65
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:403
  13: std::panicking::try::h86d5b2b66caec4cf
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:367
  14: std::panic::catch_unwind::h7dd136d787f51397
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panic.rs:133
  15: std::rt::lang_start_internal::{{closure}}::h4a199351e630a8a5
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/rt.rs:128
  16: std::panicking::try::do_call::h20ceb9e5dff838c6
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:403
  17: std::panicking::try::hc2abb46a5e41bd43
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:367
  18: std::panic::catch_unwind::h52c3eb4408ad6dfb
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panic.rs:133
  19: std::rt::lang_start_internal::hd15a47be08101c28
      at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/rt.rs:128
  20: main<unknown>
      at <unknown source file>:<unknown line>
  21: __libc_start_main<unknown>
      at <unknown source file>:<unknown line>
  22: _start<unknown>
      at <unknown source file>:<unknown line>

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.

To reproduce:

docker run --user 1000:1000 -e RUST_BACKTRACE=full  --network host lazam/introspector:latest block-time-monitor --ws "wss://rococo-canvas-rpc.polkadot.io.io:443,wss://rococo-statemint-rpc.polkadot.io:443,wss://versi-tick-rpc.polkadot.io:443" prometheus --port 9615

Expectation:

Introspector will just log the unreachable RPC nodes and continue on scraping other RPC nodes. RPC nodes being unreachable are common. Other option is to have a metric exposed if a node is reachable or not, something like:

introspector_rpc_node_up{node="wss://kusama-rpc.polkadot.io:443"} 1
introspector_rpc_node_up{node="wss://kusama-non-existing-rpc.polkadot.io:443"} 0
AndreiEres commented 1 year ago

Currently, we only observe many chains in the block-time monitor. The other tools operate in a single-chain mode, and it doesn't make sense to continue working without a connection to a specific node.

In this case, we can use multiple nodes for a single chain to enhance stability. We can cycle through the nodes, connecting to the next one when we get disconnected from the current one. See https://github.com/paritytech/polkadot-introspector/issues/355