paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.81k stars 659 forks source link

Parachain node panics after upgrade to `polkadot-v1.8.0` #3663

Closed gdethier closed 6 months ago

gdethier commented 6 months ago

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Description of bug

We have a parachain connected to Polkadot. We are trying to update our polkadot-sdk dependencies (in particular pallets) from polkadot-v1.0.0 to polkadot-v1.8.0.

We first tested the upgrade with try-runtime, everything seemed OK.

We then used Zombienet and spawned a test relay chain and a test parachain to apply the upgrade on a running chain. After the upgrade, the collator node panics on each collation with the following error message:

2024-03-12 14:30:24 [Parachain] Starting collation. relay_parent=0xbeeadc14ee94bc8d7da2f85b2c955055024c76b11bc527e893e9e8b317820839 at=0xdc0481a5785822be9ae680512bceea2b07a50547a98d1d21ef639a5f78b3c3c7
2024-03-12 14:30:24 [Parachain] ๐Ÿ™Œ Starting consensus session on top of parent 0xdc0481a5785822be9ae680512bceea2b07a50547a98d1d21ef639a5f78b3c3c7    
2024-03-12 14:30:24 [Parachain] ๐Ÿฅ New pallet "MessageQueue" detected in the runtime. The pallet has no defined storage version, so the on-chain version is being initialized to StorageVersion(0).    
2024-03-12 14:30:24 [Parachain] ๐Ÿงน Removed 1 keys while clearing `ParachainSystem.HostConfiguration`    
2024-03-12 14:30:24 [Parachain] ๐Ÿšš Pallet "XcmpQueue" VersionedMigration migrating storage version from 3 to 4.    
2024-03-12 14:30:24 [Parachain] panicked at /home/builder/cargo/git/checkouts/polkadot-sdk-cff69157b985ed76/ec7817e/cumulus/pallets/parachain-system/src/lib.rs:1297:30:
included head not present in relay storage proof    
2024-03-12 14:30:24 [Parachain] 1 storage transactions are left open by the runtime. Those will be rolled back.    
2024-03-12 14:30:24 [Parachain] 1 storage transactions are left open by the runtime. Those will be rolled back.    
2024-03-12 14:30:24 [Parachain] โ—๏ธ Inherent extrinsic returned unexpected error: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
WASM backtrace:
error while executing at wasm backtrace:
    0: 0x33c32b - logion_runtime.wasm!rust_begin_unwind
    1: 0xf3f9 - logion_runtime.wasm!core::panicking::panic_fmt::h24234feb7a22a692
    2: 0x11bcfc - logion_runtime.wasm!cumulus_pallet_parachain_system::<impl cumulus_pallet_parachain_system::pallet::Pallet<T>>::maybe_drop_included_ancestors::h3d705b344b0a01bf
    3: 0x16d05c - logion_runtime.wasm!frame_support::storage::transactional::with_transaction::ha4ac345f01e66d66
    4: 0x138b33 - logion_runtime.wasm!<cumulus_pallet_parachain_system::pallet::Call<T> as frame_support::traits::dispatch::UnfilteredDispatchable>::dispatch_bypass_filter::{{closure}}::he61e926094b32486
    5: 0x13f81c - logion_runtime.wasm!environmental::local_key::LocalKey<T>::with::h5650e28c52f950c8
    6: 0xf1670 - logion_runtime.wasm!<logion_runtime::RuntimeCall as frame_support::traits::dispatch::UnfilteredDispatchable>::dispatch_bypass_filter::hdb5f84d6bfa3cee1
    7: 0xefe84 - logion_runtime.wasm!<logion_runtime::RuntimeCall as sp_runtime::traits::Dispatchable>::dispatch::ha7074ba478f71d77
    8: 0x1396bc - logion_runtime.wasm!<sp_runtime::generic::checked_extrinsic::CheckedExtrinsic<AccountId,Call,Extra> as sp_runtime::traits::Applyable>::apply::h3730071a59a3bab8
    9: 0x17bab2 - logion_runtime.wasm!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::apply_extrinsic::h3af2e77a5e60b0ed
   10: 0x2d7dd3 - logion_runtime.wasm!BlockBuilder_apply_extrinsic. Dropping.    
2024-03-12 14:30:24 [Parachain] panicked at /home/builder/cargo/git/checkouts/polkadot-sdk-cff69157b985ed76/ec7817e/cumulus/pallets/parachain-system/src/lib.rs:265:18:
set_validation_data inherent needs to be present in every block!    
2024-03-12 14:30:24 [Parachain] Proposing failed: Import failed: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
WASM backtrace:
error while executing at wasm backtrace:
    0: 0x33c32b - logion_runtime.wasm!rust_begin_unwind
    1: 0xf3f9 - logion_runtime.wasm!core::panicking::panic_fmt::h24234feb7a22a692
    2: 0x8160 - logion_runtime.wasm!core::panicking::panic_display::h5ba56000e561e3ee
    3: 0x8118 - logion_runtime.wasm!core::option::expect_failed::h554514779c4c8680
    4: 0x121f04 - logion_runtime.wasm!<(TupleElement0,TupleElement1,TupleElement2,TupleElement3,TupleElement4,TupleElement5,TupleElement6,TupleElement7,TupleElement8,TupleElement9,TupleElement10,TupleElement11,TupleElement12,TupleElement13,TupleElement14,TupleElement15,TupleElement16,TupleElement17,TupleElement18,TupleElement19) as frame_support::traits::hooks::OnFinalize<BlockNumber>>::on_finalize::he9913704237db42d
    5: 0x17bf71 - logion_runtime.wasm!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::idle_and_finalize_hook::ha3bb28178a2db66e
    6: 0x17bfe7 - logion_runtime.wasm!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::finalize_block::h39fc718975545dcc
    7: 0x2d7e84 - logion_runtime.wasm!BlockBuilder_finalize_block

As a result, block production is stuck.

Steps to reproduce

  1. Clone this repo
  2. Checkout branch feature/fix-upgrade
  3. Follow these instructions to deploy the parachain locally (use the scripts to download binaries in steps 1 and 2)
  4. Perform an upgrade, you may reuse the attached runtime (code hash 0x645e683c044f930c1e92c4886bae1baa368a7692adc08c6c169d7390d8ac9159) or rebuild it from the branch using srtool (see here)
bkchr commented 6 months ago

Upgrade your node, this should fix it.

gdethier commented 6 months ago

Upgrade your node, this should fix it.

Which one? Relay or para?

gdethier commented 6 months ago

OK, sorry got it. I just used the latest executable and indeed it fixed the problem. So we just did not use the latest binary after the runtime upgrade, which caused the error described above. I am closing now. Thanks for your help @bkchr !