paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

Critical database error: Custom { kind: Other, error: Error { message: "Corruption: Corrupted compressed block contents: Snappy" } } #13113

Closed jasl closed 1 year ago

jasl commented 1 year ago

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Description of bug

This is reported from our user (Khala)

I've confirm his disk space is enough

it occurrs when the user do a freash sync

2023-01-10 05:18:20 Accepting new connection 1/100
2023-01-10 05:18:21 [Relaychain] ⚙️  Syncing 550.4 bps, target=#16124966 (40 peers), best: #908288 (0xdcd4…5be5), finalized #907314 (0x0230…baea), ⬇ 370.4kiB/s ⬆ 121.9kiB/s
2023-01-10 05:18:21 [Parachain] ⚙️  Syncing 51.2 bps, target=#3082035 (40 peers), best: #246053 (0x6120…3ef6), finalized #0 (0xd435…be8d), ⬇ 97.6kiB/s ⬆ 5.8kiB/s
2023-01-10 05:18:22 Accepting new connection 1/100
2023-01-10 05:18:24 Accepting new connection 1/100
2023-01-10 05:18:25 [Parachain] assembling new collators for new session 138 at #246600
2023-01-10 05:18:26 Accepting new connection 1/100
2023-01-10 05:18:26 [Relaychain] ⚙️  Syncing 717.0 bps, target=#16124966 (40 peers), best: #911873 (0x2a80…5aa9), finalized #911872 (0xc88f…23d6), ⬇ 255.1kiB/s ⬆ 126.6kiB/s
2023-01-10 05:18:26 [Parachain] ⚙️  Syncing 329.0 bps, target=#3082035 (40 peers), best: #247698 (0x5886…3f55), finalized #0 (0xd435…be8d), ⬇ 129.5kiB/s ⬆ 5.9kiB/s
2023-01-10 05:18:28 [Parachain] assembling new collators for new session 139 at #248400
2023-01-10 05:18:31 [Relaychain] ⚙️  Syncing 750.6 bps, target=#16124966 (40 peers), best: #915626 (0xd9b4…d21b), finalized #915456 (0x4abc…788c), ⬇ 403.3kiB/s ⬆ 115.7kiB/s
2023-01-10 05:18:31 [Parachain] ⚙️  Syncing 144.6 bps, target=#3082035 (40 peers), best: #248421 (0xf8d1…5df3), finalized #0 (0xd435…be8d), ⬇ 1.1MiB/s ⬆ 7.0kiB/s
2023-01-10 05:18:32 Accepting new connection 1/100
2023-01-10 05:18:34 Accepting new connection 1/100
2023-01-10 05:18:36 [Relaychain] ⚙️  Syncing 772.4 bps, target=#16124966 (40 peers), best: #919488 (0x5fff…9ca3), finalized #919040 (0xb297…0f95), ⬇ 410.9kiB/s ⬆ 119.2kiB/s
2023-01-10 05:18:36 [Parachain] ⚙️  Syncing  0.0 bps, target=#3082035 (40 peers), best: #248421 (0xf8d1…5df3), finalized #0 (0xd435…be8d), ⬇ 90.8kiB/s ⬆ 4.8kiB/s
2023-01-10 05:18:36 Accepting new connection 1/100
2023-01-10 05:18:38 Accepting new connection 1/100
2023-01-10 05:18:40 Accepting new connection 1/100
2023-01-10 05:18:41 [Relaychain] ⚙️  Syncing 807.8 bps, target=#16124966 (40 peers), best: #923527 (0xa17a…2520), finalized #922624 (0xd06d…07a2), ⬇ 297.7kiB/s ⬆ 105.9kiB/s
2023-01-10 05:18:41 [Parachain] ⚙️  Syncing 51.2 bps, target=#3082035 (40 peers), best: #248677 (0xfe91…8067), finalized #0 (0xd435…be8d), ⬇ 129.0kiB/s ⬆ 4.6kiB/s
2023-01-10 05:18:42 Accepting new connection 1/100

====================

Version: 0.1.20-269fd670cde

   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/758f19645b8ebce61ea52d1f6672fd057bc8dbee/library/alloc/src/boxed.rs:2001:9
      std::panicking::rust_panic_with_hook
             at /rustc/758f19645b8ebce61ea52d1f6672fd057bc8dbee/library/std/src/panicking.rs:692:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/758f19645b8ebce61ea52d1f6672fd057bc8dbee/library/std/src/panicking.rs:579:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/758f19645b8ebce61ea52d1f6672fd057bc8dbee/library/std/src/sys_common/backtrace.rs:137:18
   4: rust_begin_unwind
             at /rustc/758f19645b8ebce61ea52d1f6672fd057bc8dbee/library/std/src/panicking.rs:575:5
   5: core::panicking::panic_fmt
             at /rustc/758f19645b8ebce61ea52d1f6672fd057bc8dbee/library/core/src/panicking.rs:65:14
   6: <sp_database::kvdb::DbAdapter<D> as sp_database::Database<H>>::get
   7: <sc_client_db::StorageDb<Block> as sp_state_machine::trie_backend_essence::Storage<<<Block as sp_runtime::traits::Block>::Header as sp_runtime::traits::Header>::Hashing>>::get
   8: <sp_state_machine::trie_backend_essence::TrieBackendEssence<S,H,C> as hash_db::HashDB<H,alloc::vec::Vec<u8>>>::get
   9: <sp_state_machine::trie_backend_essence::TrieBackendEssence<S,H,C> as hash_db::HashDBRef<H,alloc::vec::Vec<u8>>>::get
  10: trie_db::lookup::Lookup<L,Q>::look_up_with_cache_internal::{{closure}}
  11: <sp_trie::cache::TrieCache<H> as trie_db::TrieCache<sp_trie::node_codec::NodeCodec<H>>>::get_or_insert_node
  12: trie_db::lookup::Lookup<L,Q>::look_up_with_cache::{{closure}}
  13: trie_db::Trie::get
  14: sp_state_machine::trie_backend_essence::TrieBackendEssence<S,H,C>::storage
  15: <sp_state_machine::ext::Ext<H,B> as sp_externalities::Externalities>::storage
  16: sp_io::storage::read_version_1
  17: sp_io::storage::ExtStorageReadVersion1::call
  18: <F as wasmtime::func::IntoFunc<T,(wasmtime::func::Caller<T>,A1,A2,A3),R>>::into_func::wasm_to_host_shim
  19: <unknown>
  20: <unknown>
  21: <unknown>
  22: <unknown>
  23: <unknown>
  24: <unknown>
  25: wasmtime_runtime::traphandlers::catch_traps::call_closure
  26: wasmtime_setjmp
  27: sc_executor_wasmtime::runtime::perform_call
  28: <sc_executor_wasmtime::runtime::WasmtimeInstance as sc_executor_common::wasm_runtime::WasmInstance>::call_with_allocation_stats
  29: sc_executor_common::wasm_runtime::WasmInstance::call_export
  30: sc_executor::native_executor::WasmExecutor<H>::with_instance::{{closure}}
  31: <sc_executor::native_executor::NativeElseWasmExecutor<D> as sp_core::traits::CodeExecutor>::call
  32: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_aux
  33: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_using_consensus_failure_handler
  34: <sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt<Block>>::call_api_at
  35: <polkadot_runtime::RuntimeApiImpl<__SR_API_BLOCK__,RuntimeApiImplCall> as sp_api::Core<__SR_API_BLOCK__>>::__runtime_api_internal_call_api_at
  36: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  37: <sc_finality_grandpa::import::GrandpaBlockImport<BE,Block,Client,SC> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  38: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  39: <beefy_gadget::import::BeefyBlockImport<Block,BE,Runtime,I> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  40: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  41: <sc_consensus_babe::BabeBlockImport<Block,Client,Inner> as sc_consensus::block_import::BlockImport<Block>>::import_block::{{closure}}
  42: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  43: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  44: sc_consensus::import_queue::basic_queue::block_import_process::{{closure}}
  45: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  46: <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll
  47: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  48: tokio::runtime::task::raw::poll
  49: std::sys_common::backtrace::__rust_begin_short_backtrace
  50: core::ops::function::FnOnce::call_once{{vtable.shim}}
  51: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/758f19645b8ebce61ea52d1f6672fd057bc8dbee/library/alloc/src/boxed.rs:1987:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/758f19645b8ebce61ea52d1f6672fd057bc8dbee/library/alloc/src/boxed.rs:1987:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/758f19645b8ebce61ea52d1f6672fd057bc8dbee/library/std/src/sys/unix/thread.rs:108:17
  52: <unknown>
  53: __clone

Thread 'tokio-runtime-worker' panicked at 'Critical database error: Custom { kind: Other, error: Error { message: "Corruption: Corrupted compressed block contents: Snappy" } }', /root/.cargo/git/checkouts/substrate-7e08433d4c370a21/2dff067/primitives/database/src/kvdb.rs:29

This is a bug. Please report it at:

        https://github.com/Phala-Network/khala-parachain/issues/new

2023-01-10 05:18:43 [Parachain] assembling new collators for new session 140 at #250200
2023-01-10 05:18:44 Accepting new connection 1/100
Starting Khala node as role 'MINER' with extra parachain args '' extra relaychain args ''
2023-01-10 08:12:44 Khala Node
2023-01-10 08:12:44 ✌️  version 0.1.20-269fd670cde
2023-01-10 08:12:44 ❤️  by Phala Network, 2018-2023
2023-01-10 08:12:44 📋 Chain specification: Khala
2023-01-10 08:12:44 🏷  Node name: pha0001
2023-01-10 08:12:44 👤 Role: FULL
2023-01-10 08:12:44 💾 Database: RocksDb at /root/data/chains/khala/db/full
2023-01-10 08:12:44 ⛓  Native runtime: khala-1201 (khala-0.tx6.au1)
2023-01-10 08:12:44 It isn't safe to expose RPC publicly without a proxy server that filters available set of RPC methods.
2023-01-10 08:12:44 It isn't safe to expose RPC publicly without a proxy server that filters available set of RPC methods.
2023-01-10 08:12:44 Parachain id: Id(2004)
2023-01-10 08:12:44 Parachain Account: 5Ec4AhPVjsshXjh8ynp6MwaJTJBnen3pkHiiyDhHfie5VWkN
2023-01-10 08:12:44 Parachain genesis state: 0x000000000000000000000000000000000000000000000000000000000000000000fd2e3e07ed2d610c6c0c6c3cd6858fff733fe53c03bd75d46f25e7c69dec490b03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400
2023-01-10 08:12:44 Is collating: no
Error: Service(Client(Backend("Invalid argument: Column families not opened: col12, col11, col10, col9, col8, col7, col6, col5, col4, col3, col2, col1, col0")))

Steps to reproduce

No step to reproduce, it occured when the node is running (syncing), disk space is enough

bkchr commented 1 year ago

So this is not reproducible?

Given the error, I would assume it is some corruption of the disk content?

jasl commented 1 year ago

I think it is unreproducible like issues I previous submitted.

some corruption of the disk content

Probably, but I'm not sure the courrption is made by the user, he told me this is a new 4T SSD.

bkchr commented 1 year ago

https://github.com/apache/incubator-kvrocks/issues/788 reported something similar. However, this really doesn't seem to be any Substrate issue.

I would recommend you try ParityDB, for rocksdb we can not really provide any support.

I'm going to close this issue. If you have new insights or some reproduction, please reopen it.

jasl commented 1 year ago

apache/incubator-kvrocks#788 reported something similar. However, this really doesn't seem to be any Substrate issue.

I would recommend you try ParityDB, for rocksdb we can not really provide any support.

I'm going to close this issue. If you have new insights or some reproduction, please reopen it.

yeah, I have started evaluation of paritydb last year after you reccomend me, but most of our users doesn't change the default, we plan recommend users switch to paritydb in Spring

bkchr commented 1 year ago

Yeah, I think we should also slowly make it the default!