Closed Anastasiia-Khab closed 4 months ago
Upstream issue: https://github.com/libp2p/rust-libp2p/issues/5471
CC @paritytech/networking
We are experiencing the same on all our relay westend/kusama/polkadot RPC/Bootnodes since the upgrade to v1.14.0
Jul 05 00:21:30 polkadot-rpc-1 polkadot[7113]: 2024-07-05 00:21:30 🏆 Imported #21506881 (0xe25e…f3d2 → 0xd681…0ebf)
Jul 05 00:21:33 polkadot-rpc-1 polkadot[7113]: 2024-07-05 00:21:33 💤 Idle (43 peers), best: #21506881 (0xd681…0ebf), finalized #21506879 (0xdbe0…86e2), ⬇ 1005.4kiB/s ⬆ 1.3MiB/s
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: ====================
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: Version: 1.14.0-364961445b7
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 0: sp_panic_handler::set::{{closure}}
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 1: std::panicking::rust_panic_with_hook
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 2: std::panicking::begin_panic::{{closure}}
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 3: std::sys_common::backtrace::__rust_end_short_backtrace
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 4: std::panicking::begin_panic
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 5: <quicksink::SinkImpl<S,F,T,A,E> as futures_sink::Sink<A>>::poll_ready
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 6: futures_util::future::either::if_std::<impl futures_io::if_std::AsyncWrite for futures_util::future::either::Either<A,B>>::poll_write
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 7: <libp2p_noise::io::framed::NoiseFramed<T,S> as futures_sink::Sink<&alloc::vec::Vec<u8>>>::poll_ready
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 8: <libp2p_noise::io::Output<T> as futures_io::if_std::AsyncWrite>::poll_write
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 9: <yamux::frame::io::Io<T> as futures_sink::Sink<yamux::frame::Frame<()>>>::poll_ready
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 10: yamux::connection::Connection<T>::poll_next_inbound
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 11: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 12: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 13: libp2p_swarm::connection::Connection<THandler>::poll
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 14: libp2p_swarm::connection::pool::task::new_for_established_connection::{{closure}}
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 15: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 16: tokio::runtime::task::raw::poll
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 17: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 18: tokio::runtime::scheduler::multi_thread::worker::run
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 19: tokio::runtime::task::raw::poll
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 20: std::sys_common::backtrace::__rust_begin_short_backtrace
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 21: core::ops::function::FnOnce::call_once{{vtable.shim}}
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 22: std::sys::pal::unix::thread::Thread::new::thread_start
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 23: <unknown>
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: 24: <unknown>
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: Thread 'tokio-runtime-worker' panicked at 'SinkImpl::poll_ready called after error.', /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/quicksink-0.1.2/src/lib.rs:158
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: This is a bug. Please report it at:
Jul 05 00:21:36 polkadot-rpc-1 polkadot[7113]: https://github.com/paritytech/polkadot-sdk/issues/new
Jul 05 00:21:36 polkadot-rpc-1 systemd[1]: polkadot-rpc-1.service: Main process exited, code=exited, status=1/FAILURE
Jul 05 00:21:36 polkadot-rpc-1 systemd[1]: polkadot-rpc-1.service: Failed with result 'exit-code'.
Jul 05 00:21:36 polkadot-rpc-1 systemd[1]: polkadot-rpc-1.service: Consumed 37min 38.635s CPU time.
Jul 05 00:22:06 polkadot-rpc-1 systemd[1]: polkadot-rpc-1.service: Scheduled restart job, restart counter is at 9.
Jul 05 00:22:07 polkadot-rpc-1 systemd[1]: Started polkadot-rpc-1.service - Polkadot RPC (Archive).
Jul 05 00:22:07 polkadot-rpc-1 polkadot[7500]: 2024-07-05 00:22:07 Parity Polkadot
Jul 05 00:22:07 polkadot-rpc-1 polkadot[7500]: 2024-07-05 00:22:07 ✌️ version 1.14.0-364961445b7
Jul 05 00:22:07 polkadot-rpc-1 polkadot[7500]: 2024-07-05 00:22:07 ❤️ by Parity Technologies <admin@parity.io>, 2017-2024
Jul 05 00:22:07 polkadot-rpc-1 polkadot[7500]: 2024-07-05 00:22:07 📋 Chain specification: Polkadot
Jul 05 00:22:07 polkadot-rpc-1 polkadot[7500]: 2024-07-05 00:22:07 🏷 Node name: turboflakes-rpc-1
Are there any particular flags in the logs we should add to help troubleshoot this?
Below is the substrate_sub_libp2p_peers_count
and we can clearly see the frequency of crashes the node acting as a bootnode experienced over the last 48h.
I was able to reproduce this after approx 34h running libp2p backend:
====================
Version: 1.13.0-4f46b6e9ffd
0: sp_panic_handler::set::{{closure}}
1: std::panicking::rust_panic_with_hook
2: std::panicking::begin_panic::{{closure}}
3: std::sys_common::backtrace::__rust_end_short_backtrace
4: std::panicking::begin_panic
5: <quicksink::SinkImpl<S,F,T,A,E> as futures_sink::Sink<A>>::poll_ready
6: <rw_stream_sink::RwStreamSink<S> as futures_io::if_std::AsyncWrite>::poll_write
7: <libp2p_noise::io::framed::NoiseFramed<T,S> as futures_sink::Sink<&alloc::vec::Vec<u8>>>::poll_ready
8: <libp2p_noise::io::Output<T> as futures_io::if_std::AsyncWrite>::poll_write
9: <yamux::frame::io::Io<T> as futures_sink::Sink<yamux::frame::Frame<()>>>::poll_ready
10: yamux::connection::Connection<T>::poll_next_inbound
11: <libp2p_yamux::Muxer<C> as libp2p_core::muxing::StreamMuxer>::poll
12: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
13: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
14: libp2p_swarm::connection::pool::task::new_for_established_connection::{{closure}}
15: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
16: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
17: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
18: std::panicking::try
19: tokio::runtime::task::harness::Harness<T,S>::poll
20: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
21: tokio::runtime::scheduler::multi_thread::worker::Context::run
22: tokio::runtime::context::set_scheduler
23: tokio::runtime::context::runtime::enter_runtime
24: tokio::runtime::scheduler::multi_thread::worker::run
25: tokio::runtime::task::core::Core<T,S>::poll
26: tokio::runtime::task::harness::Harness<T,S>::poll
27: std::sys_common::backtrace::__rust_begin_short_backtrace
28: core::ops::function::FnOnce::call_once{{vtable.shim}}
29: std::sys::pal::unix::thread::Thread::new::thread_start
30: <unknown>
31: <unknown>
Thread 'tokio-runtime-worker' panicked at 'SinkImpl::poll_ready called after error.', /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/quicksink-0.1.2/src/lib.rs:158
This is a bug. Please report it at:
https://github.com/paritytech/polkadot-sdk/issues/new
Will have a look into it
Hi, getting the same error on some Polkadot and Kusama nodes, but also on parachains like Bridgehub-Polkadot
Jul 04 16:09:39 polkadot-rpc1 polkadot[636343]: 2024-07-04 16:09:39 💤 Idle (18 peers), best: #21502031 (0x2103…8bc7), finalized #21502027 (0x9e4d…abd4), ⬇ 866.9kiB/s ⬆ 983.9kiB/s
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: ====================
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: Version: 1.14.0-364961445b7
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 0: sp_panic_handler::set::{{closure}}
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 1: std::panicking::rust_panic_with_hook
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 2: std::panicking::begin_panic::{{closure}}
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 3: std::sys_common::backtrace::__rust_end_short_backtrace
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 4: std::panicking::begin_panic
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 5: <quicksink::SinkImpl<S,F,T,A,E> as futures_sink::Sink<A>>::poll_ready
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 6: futures_util::future::either::if_std::<impl futures_io::if_std::AsyncWrite for futures_util::future::either::Either<A,B>>::poll_write
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 7: <libp2p_noise::io::framed::NoiseFramed<T,S> as futures_sink::Sink<&alloc::vec::Vec<u8>>>::poll_ready
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 8: <libp2p_noise::io::Output<T> as futures_io::if_std::AsyncWrite>::poll_write
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 9: <yamux::frame::io::Io<T> as futures_sink::Sink<yamux::frame::Frame<()>>>::poll_ready
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 10: yamux::connection::Connection<T>::poll_next_inbound
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 11: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 12: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 13: libp2p_swarm::connection::Connection<THandler>::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 14: libp2p_swarm::connection::pool::task::new_for_established_connection::{{closure}}
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 15: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 16: tokio::runtime::task::raw::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 17: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 18: tokio::runtime::scheduler::multi_thread::worker::run
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 19: tokio::runtime::task::raw::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 20: std::sys_common::backtrace::__rust_begin_short_backtrace
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 21: core::ops::function::FnOnce::call_once{{vtable.shim}}
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 22: std::sys::pal::unix::thread::Thread::new::thread_start
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 23: <unknown>
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 24: <unknown>
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: Thread 'tokio-runtime-worker' panicked at 'SinkImpl::poll_ready called after error.', /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/quicksink-0.1.2/src/lib.rs:158
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: This is a bug. Please report it at:
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: https://github.com/paritytech/polkadot-sdk/issues/new
Jul 04 16:09:41 polkadot-rpc1 systemd[1]: polkadot.service: Main process exited, code=exited, status=1/FAILURE
Hi, getting the same error on some Polkadot and Kusama nodes, but also on parachains like Bridgehub-Polkadot
Jul 04 16:09:39 polkadot-rpc1 polkadot[636343]: 2024-07-04 16:09:39 💤 Idle (18 peers), best: #21502031 (0x2103…8bc7), finalized #21502027 (0x9e4d…abd4), ⬇ 866.9kiB/s ⬆ 983.9kiB/s
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: ====================
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: Version: 1.14.0-364961445b7
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 0: sp_panic_handler::set::{{closure}}
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 1: std::panicking::rust_panic_with_hook
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 2: std::panicking::begin_panic::{{closure}}
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 3: std::sys_common::backtrace::__rust_end_short_backtrace
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 4: std::panicking::begin_panic
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 5: <quicksink::SinkImpl<S,F,T,A,E> as futures_sink::Sink<A>>::poll_ready
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 6: futures_util::future::either::if_std::<impl futures_io::if_std::AsyncWrite for futures_util::future::either::Either<A,B>>::poll_write
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 7: <libp2p_noise::io::framed::NoiseFramed<T,S> as futures_sink::Sink<&alloc::vec::Vec<u8>>>::poll_ready
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 8: <libp2p_noise::io::Output<T> as futures_io::if_std::AsyncWrite>::poll_write
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 9: <yamux::frame::io::Io<T> as futures_sink::Sink<yamux::frame::Frame<()>>>::poll_ready
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 10: yamux::connection::Connection<T>::poll_next_inbound
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 11: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 12: <libp2p_core::muxing::boxed::Wrap<T> as libp2p_core::muxing::StreamMuxer>::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 13: libp2p_swarm::connection::Connection<THandler>::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 14: libp2p_swarm::connection::pool::task::new_for_established_connection::{{closure}}
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 15: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 16: tokio::runtime::task::raw::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 17: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 18: tokio::runtime::scheduler::multi_thread::worker::run
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 19: tokio::runtime::task::raw::poll
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 20: std::sys_common::backtrace::__rust_begin_short_backtrace
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 21: core::ops::function::FnOnce::call_once{{vtable.shim}}
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 22: std::sys::pal::unix::thread::Thread::new::thread_start
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 23: <unknown>
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: 24: <unknown>
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: Thread 'tokio-runtime-worker' panicked at 'SinkImpl::poll_ready called after error.', /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/quicksink-0.1.2/src/lib.rs:158
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: This is a bug. Please report it at:
Jul 04 16:09:40 polkadot-rpc1 polkadot[636343]: https://github.com/paritytech/polkadot-sdk/issues/new
Jul 04 16:09:41 polkadot-rpc1 systemd[1]: polkadot.service: Main process exited, code=exited, status=1/FAILURE
The tentative fix published at: https://github.com/libp2p/rust-libp2p/pull/5482
The panic is happening because the quicksink
crate panics when polled after an error.
The crate has long been unmaintained, and the latest version of libp2p has decided to inline the crate. (our current 0.52.4 still has quicksink as dependency in libp2p)
The fix is wrapping the quicksink and ensures that io::ErrorKind::Other
is returned after further polling on errors.
This can be done directly at the quicksink
level, however I'd to include this in a point release on top of 0.52.4
, which requires other patches.
This needs further testing to confirm everything is ok
Count | Level | Triage report
3356 | warn | Notification block pinning limit reached. Unpinning block with hash = .*
506 | warn | 💔 Error importing block .*: .*
314 | warn | Re-finalized block #.* \(.*\) in the canonical chain, current best finalized is #.*
124 | warn | Report .*: .* to .*. Reason: .*. Banned, disconnecting.
2 | warn | Can't listen on .* because: .*
2 | warn | Failed to run the random write disk benchmark: .*
2 | warn | Failed to run the sequential write disk benchmark: .*
1 | error | 🥩 Error: .*. Restarting voter.
1 | warn | ❌ Error while dialing .*: .*
No panics where encountered
Is there an existing issue?
Experiencing problems? Have you tried our Stack Exchange first?
Description of bug
Noticed node failure on Polkadot and Kusama with Error:
Version: 1.14.0 0: sp_panic_handler::set::{{closure}} 1: std::panicking::rust_panic_with_hook 2: std::panicking::begin_panic::{{closure}} 3: std::sys_common::backtrace::__rust_end_short_backtrace 4: std::panicking::begin_panic 5: <quicksink::SinkImpl<S,F,T,A,E> as futures_sink::Sink>::poll_ready 6: futures_util::future::either::if_std::<impl futures_io::if_std::AsyncWrite for futures_util::future::either::Either<A,B>>::poll_write 7: <libp2p_noise::io::framed::NoiseFramed<T,S> as futures_sink::Sink<&alloc::vec::Vec>>::poll_ready
8: <libp2p_noise::io::Output as futures_io::if_std::AsyncWrite>::poll_write
9: <yamux::frame::io::Io as futures_sink::Sink<yamux::frame::Frame<()>>>::poll_ready
10: yamux::connection::Connection::poll_next_inbound
11: <libp2p_core::muxing::boxed::Wrap as libp2p_core::muxing::StreamMuxer>::poll
12: <libp2p_core::muxing::boxed::Wrap as libp2p_core::muxing::StreamMuxer>::poll
13: libp2p_swarm::connection::Connection::poll
14: libp2p_swarm::connection::pool::task::new_for_established_connection::{{closure}}
15: <tracing_futures::Instrumented as core::future::future::Future>::poll
16: tokio::runtime::task::raw::poll
17: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
18: tokio::runtime::scheduler::multi_thread::worker::run
19: tokio::runtime::task::raw::poll
20: std::sys_common::backtrace::rust_begin_short_backtrace
21: core::ops::function::FnOnce::call_once{{vtable.shim}}
22: std::sys::pal::unix::thread::Thread::new::thread_start
23: start_thread
24: clone3
Thread 'tokio-runtime-worker' panicked at 'SinkImpl::poll_ready called after error.', /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/quicksink-0.1.2/src/lib.rs:158
This is a bug. Please report it at:
https://github.com/paritytech/polkadot-sdk/issues/new
Steps to reproduce
No response