moondance-labs / tanssi

GNU General Public License v3.0
116 stars 44 forks source link

Add support for parathread - part 3 #612

Closed ParthDesai closed 1 week ago

ParthDesai commented 2 weeks ago

Description

This PR enables support for parathread on client side. When we detect that there is no core assigned for this para id, we check if it is parathread and if yes, we attempt to buy the core.

To enable this I had to do some minor refactoring of node service:

What is remaining:

Note that this PR's core is subject to change based on my findings in tests.

github-actions[bot] commented 2 weeks ago

Coverage Report

(master)

@@                        Coverage Diff                        @@
##           master   add-parathread-support-part-3      +/-   ##
=================================================================
- Coverage   67.47%                          66.78%   -0.69%     
- Files         253                             249       -4     
- Lines       43857                           43755     -102     
=================================================================
- Hits        29592                           29218     -374     
+ Misses      14265                           14537     +272     
Files Changed Coverage
/client/consensus/src/collators.rs 83.76% (-1.02%) 🔽
/client/consensus/src/lib.rs 27.82% (-0.86%) 🔽
/node/src/container_chain_spawner.rs 45.34% (-0.13%) 🔽
/node/src/service.rs 20.85% (-0.78%) 🔽
/pallets/collator-assignment/src/lib.rs 93.88% (-0.02%) 🔽
/pallets/configuration/src/lib.rs 87.79% (-0.27%) 🔽
/pallets/invulnerables/src/lib.rs 85.59% (-0.90%) 🔽
/pallets/xcm-core-buyer/src/lib.rs 91.55% (-0.03%) 🔽
/primitives/traits/src/lib.rs 74.19% (-6.63%) 🔽
/primitives/xcm-core-buyer/src/lib.rs 38.30% (-61.70%) 🔽
/runtime/dancebox/src/lib.rs 88.62% (-0.87%) 🔽
/runtime/dancebox/src/xcm_config.rs 84.06% (-0.48%) 🔽
/solo-chains/runtime/starlight/src/genesis_config_presets.rs 21.14% (-0.24%) 🔽
/solo-chains/runtime/starlight/src/lib.rs 16.81% (-2.93%) 🔽
/solo-chains/runtime/starlight/tests/common/mod.rs 90.60% (+0.12%) 🔼

Coverage generated Fri Jul 12 13:23:09 UTC 2024

tmpolaczyk commented 1 week ago

I left the zombienet suite running overnight and 2 collators crashed with this backtrace:

Version: 0.8.0-53f3340954c

   0: sp_panic_handler::set::{{closure}}
   1: std::panicking::rust_panic_with_hook
   2: std::panicking::begin_panic::{{closure}}
   3: std::sys_common::backtrace::__rust_end_short_backtrace
   4: std::panicking::begin_panic
   5: <tokio::future::poll_fn::PollFn<F> as core::future::future::Future>::poll
   6: tc_consensus::collators::lookahead::run::{{closure}}
   7: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
   8: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
   9: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  10: tokio::runtime::task::core::Core<T,S>::poll
  11: tokio::runtime::task::harness::Harness<T,S>::poll
  12: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  13: tokio::runtime::scheduler::multi_thread::worker::Context::run
  14: tokio::runtime::context::scoped::Scoped<T>::set
  15: tokio::runtime::context::runtime::enter_runtime
  16: tokio::runtime::scheduler::multi_thread::worker::run
  17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  18: tokio::runtime::task::core::Core<T,S>::poll
  19: tokio::runtime::task::harness::Harness<T,S>::poll
  20: std::sys_common::backtrace::__rust_begin_short_backtrace
  21: core::ops::function::FnOnce::call_once{{vtable.shim}}
  22: std::sys::pal::unix::thread::Thread::new::thread_start
  23: start_thread
             at ./nptl/pthread_create.c:442:8
  24: __GI___clone3
             at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 'tokio-runtime-worker' panicked at 'SelectNextSome polled after terminated', /home/tomasz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/stream/stream/select_next_some.rs:32

A few lines above I see this log, the collator seems stuck because of a low tx priority, and also it has 0 peers:

2024-07-09 19:30:12 [Container-2001] Unable to send buy core unsigned extrinsic through orchestrator tx pool relay_parent=0x7284781837492b809b351e77a75cf3c6d4bef1e7cd8b93e4ff04a6c8d6ccea32 para_id=Id(2001) slot=Slot(286757702) pool_error=Pool(TooLowPriority { old: 18446744073709551615, new: 18446744073709551615 })

I was testing on this commit: 53f3340954cb87860473397e49eec76200b65d69

I have more logs if needed

ParthDesai commented 1 week ago

I left the zombienet suite running overnight and 2 collators crashed with this backtrace:

Version: 0.8.0-53f3340954c

   0: sp_panic_handler::set::{{closure}}
   1: std::panicking::rust_panic_with_hook
   2: std::panicking::begin_panic::{{closure}}
   3: std::sys_common::backtrace::__rust_end_short_backtrace
   4: std::panicking::begin_panic
   5: <tokio::future::poll_fn::PollFn<F> as core::future::future::Future>::poll
   6: tc_consensus::collators::lookahead::run::{{closure}}
   7: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
   8: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
   9: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  10: tokio::runtime::task::core::Core<T,S>::poll
  11: tokio::runtime::task::harness::Harness<T,S>::poll
  12: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  13: tokio::runtime::scheduler::multi_thread::worker::Context::run
  14: tokio::runtime::context::scoped::Scoped<T>::set
  15: tokio::runtime::context::runtime::enter_runtime
  16: tokio::runtime::scheduler::multi_thread::worker::run
  17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  18: tokio::runtime::task::core::Core<T,S>::poll
  19: tokio::runtime::task::harness::Harness<T,S>::poll
  20: std::sys_common::backtrace::__rust_begin_short_backtrace
  21: core::ops::function::FnOnce::call_once{{vtable.shim}}
  22: std::sys::pal::unix::thread::Thread::new::thread_start
  23: start_thread
             at ./nptl/pthread_create.c:442:8
  24: __GI___clone3
             at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 'tokio-runtime-worker' panicked at 'SelectNextSome polled after terminated', /home/tomasz/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/stream/stream/select_next_some.rs:32

A few lines above I see this log, the collator seems stuck because of a low tx priority, and also it has 0 peers:

2024-07-09 19:30:12 [Container-2001] Unable to send buy core unsigned extrinsic through orchestrator tx pool relay_parent=0x7284781837492b809b351e77a75cf3c6d4bef1e7cd8b93e4ff04a6c8d6ccea32 para_id=Id(2001) slot=Slot(286757702) pool_error=Pool(TooLowPriority { old: 18446744073709551615, new: 18446744073709551615 })

I was testing on this commit: 53f3340

I have more logs if needed

I have identified and fixed the problem.

The problem is that while reading from tx progress stream we are using select_next_some function which panics if the stream is closed. This is generally handled by the future library's select! as it checks for stream termination. But, tokio's select! does not handle this. Now, I am using next which will return None on fused stream which we will detect and return an error.