Closed ParthDesai closed 1 week ago
@@ Coverage Diff @@
## master add-parathread-support-part-3 +/- ##
=================================================================
- Coverage 67.47% 66.78% -0.69%
- Files 253 249 -4
- Lines 43857 43755 -102
=================================================================
- Hits 29592 29218 -374
+ Misses 14265 14537 +272
Files Changed | Coverage | |
---|---|---|
/client/consensus/src/collators.rs | 83.76% (-1.02%) | 🔽 |
/client/consensus/src/lib.rs | 27.82% (-0.86%) | 🔽 |
/node/src/container_chain_spawner.rs | 45.34% (-0.13%) | 🔽 |
/node/src/service.rs | 20.85% (-0.78%) | 🔽 |
/pallets/collator-assignment/src/lib.rs | 93.88% (-0.02%) | 🔽 |
/pallets/configuration/src/lib.rs | 87.79% (-0.27%) | 🔽 |
/pallets/invulnerables/src/lib.rs | 85.59% (-0.90%) | 🔽 |
/pallets/xcm-core-buyer/src/lib.rs | 91.55% (-0.03%) | 🔽 |
/primitives/traits/src/lib.rs | 74.19% (-6.63%) | 🔽 |
/primitives/xcm-core-buyer/src/lib.rs | 38.30% (-61.70%) | 🔽 |
/runtime/dancebox/src/lib.rs | 88.62% (-0.87%) | 🔽 |
/runtime/dancebox/src/xcm_config.rs | 84.06% (-0.48%) | 🔽 |
/solo-chains/runtime/starlight/src/genesis_config_presets.rs | 21.14% (-0.24%) | 🔽 |
/solo-chains/runtime/starlight/src/lib.rs | 16.81% (-2.93%) | 🔽 |
/solo-chains/runtime/starlight/tests/common/mod.rs | 90.60% (+0.12%) | 🔼 |
Coverage generated Fri Jul 12 13:23:09 UTC 2024
I left the zombienet suite running overnight and 2 collators crashed with this backtrace:
Version: 0.8.0-53f3340954c
0: sp_panic_handler::set::{{closure}}
1: std::panicking::rust_panic_with_hook
2: std::panicking::begin_panic::{{closure}}
3: std::sys_common::backtrace::__rust_end_short_backtrace
4: std::panicking::begin_panic
5: <tokio::future::poll_fn::PollFn<F> as core::future::future::Future>::poll
6: tc_consensus::collators::lookahead::run::{{closure}}
7: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
8: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
9: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
10: tokio::runtime::task::core::Core<T,S>::poll
11: tokio::runtime::task::harness::Harness<T,S>::poll
12: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
13: tokio::runtime::scheduler::multi_thread::worker::Context::run
14: tokio::runtime::context::scoped::Scoped<T>::set
15: tokio::runtime::context::runtime::enter_runtime
16: tokio::runtime::scheduler::multi_thread::worker::run
17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
18: tokio::runtime::task::core::Core<T,S>::poll
19: tokio::runtime::task::harness::Harness<T,S>::poll
20: std::sys_common::backtrace::__rust_begin_short_backtrace
21: core::ops::function::FnOnce::call_once{{vtable.shim}}
22: std::sys::pal::unix::thread::Thread::new::thread_start
23: start_thread
at ./nptl/pthread_create.c:442:8
24: __GI___clone3
at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread 'tokio-runtime-worker' panicked at 'SelectNextSome polled after terminated', /home/tomasz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/stream/stream/select_next_some.rs:32
A few lines above I see this log, the collator seems stuck because of a low tx priority, and also it has 0 peers:
2024-07-09 19:30:12 [Container-2001] Unable to send buy core unsigned extrinsic through orchestrator tx pool relay_parent=0x7284781837492b809b351e77a75cf3c6d4bef1e7cd8b93e4ff04a6c8d6ccea32 para_id=Id(2001) slot=Slot(286757702) pool_error=Pool(TooLowPriority { old: 18446744073709551615, new: 18446744073709551615 })
I was testing on this commit: 53f3340954cb87860473397e49eec76200b65d69
I have more logs if needed
I left the zombienet suite running overnight and 2 collators crashed with this backtrace:
Version: 0.8.0-53f3340954c 0: sp_panic_handler::set::{{closure}} 1: std::panicking::rust_panic_with_hook 2: std::panicking::begin_panic::{{closure}} 3: std::sys_common::backtrace::__rust_end_short_backtrace 4: std::panicking::begin_panic 5: <tokio::future::poll_fn::PollFn<F> as core::future::future::Future>::poll 6: tc_consensus::collators::lookahead::run::{{closure}} 7: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll 8: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll 9: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll 10: tokio::runtime::task::core::Core<T,S>::poll 11: tokio::runtime::task::harness::Harness<T,S>::poll 12: tokio::runtime::scheduler::multi_thread::worker::Context::run_task 13: tokio::runtime::scheduler::multi_thread::worker::Context::run 14: tokio::runtime::context::scoped::Scoped<T>::set 15: tokio::runtime::context::runtime::enter_runtime 16: tokio::runtime::scheduler::multi_thread::worker::run 17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll 18: tokio::runtime::task::core::Core<T,S>::poll 19: tokio::runtime::task::harness::Harness<T,S>::poll 20: std::sys_common::backtrace::__rust_begin_short_backtrace 21: core::ops::function::FnOnce::call_once{{vtable.shim}} 22: std::sys::pal::unix::thread::Thread::new::thread_start 23: start_thread at ./nptl/pthread_create.c:442:8 24: __GI___clone3 at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 Thread 'tokio-runtime-worker' panicked at 'SelectNextSome polled after terminated', /home/tomasz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/stream/stream/select_next_some.rs:32
A few lines above I see this log, the collator seems stuck because of a low tx priority, and also it has 0 peers:
2024-07-09 19:30:12 [Container-2001] Unable to send buy core unsigned extrinsic through orchestrator tx pool relay_parent=0x7284781837492b809b351e77a75cf3c6d4bef1e7cd8b93e4ff04a6c8d6ccea32 para_id=Id(2001) slot=Slot(286757702) pool_error=Pool(TooLowPriority { old: 18446744073709551615, new: 18446744073709551615 })
I was testing on this commit: 53f3340
I have more logs if needed
I have identified and fixed the problem.
The problem is that while reading from tx progress stream we are using select_next_some
function which panics if the stream is closed. This is generally handled by the future library's select!
as it checks for stream termination. But, tokio's select!
does not handle this. Now, I am using next
which will return None
on fused stream which we will detect and return an error.
Description
This PR enables support for parathread on client side. When we detect that there is no core assigned for this para id, we check if it is parathread and if yes, we attempt to buy the core.
To enable this I had to do some minor refactoring of node service:
CheckCollatorValidity
used to validate unsigned tx to make it simplerWhat is remaining:
Note that this PR's core is subject to change based on my findings in tests.