paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.com/
1.89k stars 695 forks source link

Parachain upgrade (1.3.0) not being propagated to the relay chain (1.0.0) #3267

Closed girazoki closed 6 months ago

girazoki commented 9 months ago

I am creating the issue just for tracking purposes, its still under investigation.

In an internal network we tried deploying moonriver 2700 (based of polkadot-1.3.0) into a kusama runtime based of 1.0.0. While parachain collators are trying to produce the blocks, we are observing some weird behavior on the relay_parent that they build upon. The logs are:

viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:02.611 TRACE tokio-runtime-worker parachain::bitfield-distribution: [Relaychain] Peer view change peerid=PeerId("12D3KooWLNUzTGPHzGuPYqs84EiwsH9tgWYb1e8ydPUKPW31ChWH") new_view=View { heads: [0x4d7e72685ffdbcf627f31588de05870504470c28380c74ff6e4671731f60dedc, 0xdea1e1e21b024445e0fd65ce43817d49684dace0277a9941b9f484b2cffb122a], finalized_number: 13705722 }
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:03.410  WARN tokio-runtime-worker peerset: [🌗] Report 12D3KooWAVWc6tt9n4yvyZqWxdbehYGKe4CpRgqxSaytXRQ6Rg3v: -2147483648 to -2147483648. Reason: Genesis mismatch. Banned, disconnecting.    
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:03.412  WARN tokio-runtime-worker peerset: [🌗] Report 12D3KooWHNBeopN44UvyUiPNzRmitqsqyKCkmCdy77RJhXpxQkfr: -2147483648 to -2147483648. Reason: Genesis mismatch. Banned, disconnecting.    
viro-moon-moondev-1|moonsama-relay-validator-1 2024-02-08 08:59:03.848 DEBUG tokio-runtime-worker parachain::chain-selection: Prepared 1 stagnant entries for pruning up_to=1707292743 min_ts=1707292740 max_ts=1707292740
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.410 DEBUG tokio-runtime-worker parachain::validator-discovery: [Relaychain] New ConnectToValidators resolved request peer_set=Collation num_peers=0 removed=0
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.410 TRACE tokio-runtime-worker parachain::collator-protocol: [Relaychain] Peer-set updated due to a timeout timeout=4s
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.410 TRACE tokio-runtime-worker parachain::network-bridge-tx: [Relaychain] Received a validator connection request action="ConnectToValidators" peer_set=Collation ids=[]
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.410 DEBUG tokio-runtime-worker parachain::validator-discovery: [Relaychain] New ConnectToValidators request peer_set=Collation requested=0 failed_to_resolve=0
viro-moon-moondev-1|moonsama-relay-validator-1 2024-02-08 08:59:04.458 DEBUG tokio-runtime-worker parachain::approval-distribution: Received `ApprovalCheckingLagUpdate` lag=0
viro-moon-moondev-1|moonsama-relay-validator-1 2024-02-08 08:59:04.458 DEBUG tokio-runtime-worker parachain::approval-voting: approved blocks 13705724-[11]-13705723
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.710 TRACE tokio-runtime-worker parachain::approval-distribution: [Relaychain] Peer disconnected peer_id=PeerId("12D3KooWHUtdMxuzQHS7amYSx9xyX3BQcNkRDfcMkMqcw8mbjutu")
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.710 TRACE tokio-runtime-worker parachain::bitfield-distribution: [Relaychain] Processing NetworkMessage
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.710 TRACE tokio-runtime-worker parachain::bitfield-distribution: [Relaychain] Peer disconnected peer=PeerId("12D3KooWHUtdMxuzQHS7amYSx9xyX3BQcNkRDfcMkMqcw8mbjutu")
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.710 TRACE tokio-runtime-worker parachain::statement-distribution: [Relaychain] Peer disconnected peer=PeerId("12D3KooWHUtdMxuzQHS7amYSx9xyX3BQcNkRDfcMkMqcw8mbjutu")
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.710  WARN tokio-runtime-worker peerset: [Relaychain] Report 12D3KooWHUtdMxuzQHS7amYSx9xyX3BQcNkRDfcMkMqcw8mbjutu: -2147483648 to -2147483648. Reason: Genesis mismatch. Banned, disconnecting.    
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:04.710 DEBUG tokio-runtime-worker parachain::network-bridge-rx: [Relaychain] action="PeerDisconnected" peer_set=Validation peer=PeerId("12D3KooWHUtdMxuzQHS7amYSx9xyX3BQcNkRDfcMkMqcw8mbjutu")
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:05.005  INFO tokio-runtime-worker filtering-consensus: [🌗] Compressed PoV size: 3513.3095703125kb
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:05.005  INFO tokio-runtime-worker filtering-consensus: [🌗] PoV size { header: 0.3857421875kb, extrinsics: 4.62890625kb, storage_proof: 3508.19921875kb }
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:05.121 DEBUG tokio-runtime-worker parachain::collation-generation: [Relaychain] candidate is generated candidate_hash=0x84021d9eeccc74f7f5b3c5b2002971d59612ba64bfebe587b5340a3ec9fe5690 pov_hash=0x14b586fa67c0bb8f73df83c97752ef95f69aed44e6ece621cbb8619c73b5fc96 relay_parent=0xd270fdd3586f65bc88ef6b10940642a2adbd15d1c76688f8e3b9244b8a6e7310 para_id=2022 traceID=175469080818539783584368558784108917205
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:05.122 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Candidate relay parent is out of our view para_id=2022 candidate_relay_parent=0xd270…7310 candidate_hash=0x84021d9eeccc74f7f5b3c5b2002971d59612ba64bfebe587b5340a3ec9fe5690 traceID=175469080818539783584368558784108917205
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 08:59:

and it catches my eye that we see Candidate relay parent is out of our view, and in fact, when I checked the relay parent hash is around 2000 blocks off. I guess that it is trying to build with a candidate relay parent state that we have already pruned, but I have litlte information so as to why they pick a relay_parent that far in time

girazoki commented 9 months ago

Some additional logs:

tokio-runtime-worker parachain::dispute-coordinator: [Relaychain] OverseerSignal::ActiveLeaves
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::approval-distribution: [Relaychain] Own view change view=OurView { view: View { heads: [0xce495c921a004177d016e830eecde55e1ac6de032893f29111a9600a29511f9b, 0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1], finalized_number: 13709461 }, span_per_head: {0xce495c921a004177d016e830eecde55e1ac6de032893f29111a9600a29511f9b: <jaeger span>, 0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1: <jaeger span>} }
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::collator-protocol: [Relaychain] Own view change view=OurView { view: View { heads: [0xce495c921a004177d016e830eecde55e1ac6de032893f29111a9600a29511f9b, 0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1], finalized_number: 13709461 }, span_per_head: {0xce495c921a004177d016e830eecde55e1ac6de032893f29111a9600a29511f9b: <jaeger span>, 0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1: <jaeger span>} }
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::provisioner: [Relaychain] Handle ActiveLeavesUpdate
viro-moon-moondev-1|moonsama-collator-1 td/src/panicking.rs", line: 464 }, { fn: "std::panic::catch_unwind", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs", line: 142 }, { fn: "std::rt::lang_start_internal::{{closure}}", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 148 }, { fn: "std::panicking::try::do_call", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 500 }, { fn: "std::panicking::try", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 464 }, { fn: "std::panic::catch_unwind", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs", line: 142 }, { fn: "std::rt::lang_start_internal", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 148 }, { fn: "main" }, { fn: "__libc_start_main" }, { fn: "_start" }] } }), span: <jaeger span> }), deactivated: [0x58ac203633f1bb980d7b7c07cb17c0710b90
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::bitfield-distribution: [Relaychain] Peer view change peerid=PeerId("12D3KooWEiHi49QReqLKVwXjNCKfKJcsrBf63gK5TLnhP3jWtL8o") new_view=View { heads: [0x0f4448b2ac038e2285dcb3ca6bc9231acfe911e7145b74761c90a8cc8e5d09b1, 0xce495c921a004177d016e830eecde55e1ac6de032893f29111a9600a29511f9b, 0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1], finalized_number: 13709461 }
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::network-bridge-rx: [Relaychain] action="ActiveLeaves" has_activated=true num_deactivated=1
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::gossip-support: [Relaychain] active leaves signal
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::availability-distribution: [Relaychain] Update fetching heads update=ActiveLeavesUpdate { activated: Some(ActivatedLeaf { hash: 0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1, number: 13709464, unpin_handle: UnpinHandle(UnpinHandleInner { hash: 0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1, unpin_worker_sender: TracingUnboundedSender { inner: Sender { .. }, name: "unpin-worker-channel", queue_size_warning: 10000, warning_fired: false, creation_backtrace: Backtrace [{ fn: "sc_utils::mpsc::tracing_unbounded" }, { fn: "sc_service::client::client::Client<B,E,Block,RA>::new" }, { fn: "sc_service::builder::new_full_parts" }, { fn: "polkadot_service::new_partial_basics" }, { fn: "polkadot_service::new_full" }, { fn: "polkadot_service::build_full" }, { fn: "cumulus_relay_chain_inprocess_interface::build_inprocess_relay_chain" }, { fn: "moonbeam_service::build_relay_chain_int
viro-moon-moondev-1|moonsama-relay-validator-1 2024-02-08 15:13:00.219 DEBUG tokio-runtime-worker parachain::subsystem-util-vstaging: Runtime doesn't support `DisabledValidators` - continuing with an empty disabled validators set
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::approval-distribution: [Relaychain] active leaves signal (ignored)
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::dispute-coordinator: [Relaychain] Processing ActiveLeavesUpdate timestamp=1707405180
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::network-bridge-rx: [Relaychain] action="PeerMessages" peer=PeerId("12D3KooWEiHi49QReqLKVwXjNCKfKJcsrBf63gK5TLnhP3jWtL8o") num_validation_messages=1 num_collation_messages=0
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::bitfield-distribution: [Relaychain] Processing NetworkMessage
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219  INFO tokio-runtime-worker substrate: [Relaychain] ✨ Imported #13709464 (0xddab…51c1)    
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.219 TRACE tokio-runtime-worker parachain::provisioner: [Relaychain] Adding delay leaf_hash=0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.220 TRACE tokio-runtime-worker parachain::runtime-info: [Relaychain] Prospective parachains are disabled, async_backing_params is not supported by the current Runtime API relay_parent=0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.220 TRACE tokio-runtime-worker parachain::dispute-coordinator: [Relaychain] In ancestor processing. block_number=13709464 block_hash=0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.220 DEBUG tokio-runtime-worker parachain::collator-protocol: [Relaychain] Removing relay parent because our view changed. relay_parent=0x58ac203633f1bb980d7b7c07cb17c0710b90b006ed5dfe9b9dbe4aa56df85ee3
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.220 TRACE tokio-runtime-worker parachain::statement-distribution: [Relaychain] Deactivating leaf hash=0x58ac203633f1bb980d7b7c07cb17c0710b90b006ed5dfe9b9dbe4aa56df85ee3
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.220 TRACE tokio-runtime-worker parachain::statement-distribution: [Relaychain] New active leaf hash=0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.220 TRACE tokio-runtime-worker parachain::chain-api: [Relaychain] ChainApiMessage::Ancestors hash=0xddab…51c1 k=4
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.220 TRACE tokio-runtime-worker parachain::runtime-info: [Relaychain] Prospective parachains are disabled, async_backing_params is not supported by the current Runtime API relay_parent=0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.220 TRACE tokio-runtime-worker parachain::runtime-info: [Relaychain] Prospective parachains are disabled, async_backing_params is not supported by the current Runtime API relay_parent=0xddab37afbcec3de7efb3e7c0419bf2fa58f41fd2e4f0a834e487fbccc9cd51c1
viro-moon-moondev-1|moonsama-collator-1 2024-02-08 15:13:00.220 TRACE tokio-runtime-worker parachain::chain-api: [Relaychain] ChainApiMessage::Ancestors hash=0xddab…51c1 k=10
crystalin commented 9 months ago

The exact error is:

2024-02-09 11:23:48.420 TRACE tokio-runtime-worker parachain::availability-distribution: [Relaychain] Update fetching heads update=ActiveLeavesUpdate { activated: Some(ActivatedLeaf { hash: 0xbf7942bcaee9a2047a472f6d7277eaac2233f5461be6257346f38ca53f68e8e2, number: 13721531, unpin_handle: UnpinHandle(UnpinHandleInner { hash: 0xbf7942bcaee9a2047a472f6d7277eaac2233f5461be6257346f38ca53f68e8e2, unpin_worker_sender: TracingUnboundedSender { inner: Sender { .. }, name: "unpin-worker-channel", queue_size_warning: 10000, warning_fired: false, creation_backtrace: Backtrace [{ fn: "sc_utils::mpsc::tracing_unbounded" }, { fn: "sc_service::client::client::Client<B,E,Block,RA>::new" }, { fn: "sc_service::builder::new_full_parts" }, { fn: "polkadot_service::new_partial_basics" }, { fn: "polkadot_service::new_full" }, { fn: "polkadot_service::build_full" }, { fn: "cumulus_relay_chain_inprocess_interface::build_inprocess_relay_chain" }, { fn: "moonbeam_service::build_relay_chain_int
erface::{{closure}}" }, { fn: "moonbeam_service::start_node_impl::{{closure}}" }, { fn: "moonbeam_cli::command::run::{{closure}}::{{closure}}" }, { fn: "tokio::runtime::park::CachedParkThread::block_on" }, { fn: "tokio::runtime::context::runtime::enter_runtime" }, { fn: "tokio::runtime::runtime::Runtime::block_on" }, { fn: "sc_cli::runner::Runner<C>::run_node_until_exit" }, { fn: "moonbeam_cli::command::run" }, { fn: "moonbeam::main" }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace" }, { fn: "std::rt::lang_start::{{closure}}" }, { fn: "core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs", line: 284 }, { fn: "std::panicking::try::do_call", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 500 }, { fn: "std::panicking::try", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/s
td/src/panicking.rs", line: 464 }, { fn: "std::panic::catch_unwind", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs", line: 142 }, { fn: "std::rt::lang_start_internal::{{closure}}", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 148 }, { fn: "std::panicking::try::do_call", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 500 }, { fn: "std::panicking::try", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs", line: 464 }, { fn: "std::panic::catch_unwind", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs", line: 142 }, { fn: "std::rt::lang_start_internal", file: "./rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs", line: 148 }, { fn: "main" }, { fn: "__libc_start_main" }, { fn: "_start" }] } }), span: <jaeger span> }), deactivated: [0xe37e1606e837737cf597767947cb94741dd1
bkchr commented 9 months ago

kusama runtime based of 1.0.0.

What version are the validators using?

The exact error is:

Is there somewhere the actual panic message?

crystalin commented 9 months ago

The validators are running 1.6.0 and the runtime is 1001000 Concerning the message I posted with the panic, it seems unrelated to the issue but due to some saturated log channel.

We are planning a troubleshooting session Wednesday, we will probably know more why this is happening

bkchr commented 8 months ago

Any update on this or can it be closed?

crystalin commented 8 months ago

Yes, I'll close this issue. I was supposed to provide a follow-up ticket for the additional issues we faced. To summarize:

  1. The collator took a very long time to process the blocks. For some reasons we didn't understand, all the collators (on kind of beefy machines) were taking >5s "from relay-block import to candidate being generated" even with an empty block. We had to switch a separate node (externally owned) into a collator and it produced the block quickly and was able to get it backed/included

  2. The other issue was with the validator themselves, but I don't remember it anymore :( :( (@girazoki/ @librelois do you remember?)