near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.31k stars 618 forks source link

`2.0.0-rc.1` with `load_mem_tries_for_tracked_shards = true` panics #11744

Closed evgenykuzyakov closed 2 months ago

evgenykuzyakov commented 2 months ago

Describe the bug 2.0.0-rc.1 with load_mem_tries_for_tracked_shards = true panics.

To Reproduce NOTE: The nodes I reproduced this issue were not regular RPC nodes, but running near-indexer crate instead of regular neard. They were synced before restart.

  1. Have a testnet RPC node.
  2. Enable load_mem_tries_for_tracked_shards = true.
  3. Run 2.0.0-rc.1

Expected behavior Not panic

Screenshots

Jul 09 00:20:05 testnode1 redisnode[2290449]: Jul 09 00:20:05.642  INFO near_store::trie::mem::parallel_loader: Loading 1274 subtrees in parallel
Jul 09 00:20:05 testnode1 redisnode[2290449]: Jul 09 00:20:05.754  INFO near_store::trie::mem::parallel_loader: Loading 1647 subtrees in parallel
Jul 09 00:20:05 testnode1 redisnode[2290449]: Jul 09 00:20:05.758  INFO near_store::trie::mem::parallel_loader: Loading 1436 subtrees in parallel
Jul 09 00:20:05 testnode1 redisnode[2290449]: Jul 09 00:20:05.981  INFO near_store::trie::mem::parallel_loader: Loading 1911 subtrees in parallel
Jul 09 00:20:05 testnode1 redisnode[2290449]: Jul 09 00:20:05.999  INFO near_store::trie::mem::parallel_loader: Loading 1405 subtrees in parallel
Jul 09 00:20:17 testnode1 redisnode[2290449]: thread '<unnamed>' panicked at /root/.cargo/git/checkouts/nearcore-ddcb86377aa204cb/d7ef256/core/store/src/trie/mem/flexible_data/extension.rs:33:32:
Jul 09 00:20:17 testnode1 redisnode[2290449]: source slice length (386768) does not match destination slice length (59088)
Jul 09 00:20:17 testnode1 redisnode[2290449]: stack backtrace:
Jul 09 00:20:17 testnode1 redisnode[2290449]:    0: rust_begin_unwind
Jul 09 00:20:17 testnode1 redisnode[2290449]:    1: core::panicking::panic_fmt
Jul 09 00:20:17 testnode1 redisnode[2290449]:    2: core::slice::<impl [T]>::copy_from_slice::len_mismatch_fail
Jul 09 00:20:17 testnode1 redisnode[2290449]:    3: <near_store::trie::mem::flexible_data::extension::EncodedExtensionHeader as near_store::trie::mem::flexible_data::FlexibleDataHeader>::encode_flexible_data
Jul 09 00:20:17 testnode1 redisnode[2290449]:    4: near_store::trie::mem::construction::TrieConstructionSegment::to_node
Jul 09 00:20:17 testnode1 redisnode[2290449]:    5: near_store::trie::mem::construction::TrieConstructor<A>::pop_segment
Jul 09 00:20:17 testnode1 redisnode[2290449]:    6: near_store::trie::mem::construction::TrieConstructor<A>::add_leaf
Jul 09 00:20:17 testnode1 redisnode[2290449]:    7: core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut
Jul 09 00:20:17 testnode1 redisnode[2290449]:    8: <rayon::iter::fold::FoldFolder<C,ID,F> as rayon::iter::plumbing::Folder<T>>::consume_iter
Jul 09 00:20:17 testnode1 redisnode[2290449]:    9: rayon::iter::plumbing::bridge_producer_consumer::helper
Jul 09 00:20:17 testnode1 redisnode[2290449]:   10: rayon_core::join::join_context::{{closure}}
Jul 09 00:20:17 testnode1 redisnode[2290449]:   11: rayon_core::registry::in_worker
Jul 09 00:20:17 testnode1 redisnode[2290449]:   12: rayon::iter::plumbing::bridge_producer_consumer::helper
Jul 09 00:20:17 testnode1 redisnode[2290449]:   13: rayon_core::join::join_context::{{closure}}
Jul 09 00:20:17 testnode1 redisnode[2290449]:   14: rayon_core::registry::in_worker
Jul 09 00:20:17 testnode1 redisnode[2290449]:   15: rayon::iter::plumbing::bridge_producer_consumer::helper
Jul 09 00:20:17 testnode1 redisnode[2290449]:   16: rayon_core::join::join_context::{{closure}}
Jul 09 00:20:17 testnode1 redisnode[2290449]:   17: rayon_core::registry::in_worker
Jul 09 00:20:17 testnode1 redisnode[2290449]:   18: rayon::iter::plumbing::bridge_producer_consumer::helper
Jul 09 00:20:17 testnode1 redisnode[2290449]:   19: rayon_core::join::join_context::{{closure}}
Jul 09 00:20:17 testnode1 redisnode[2290449]:   20: rayon_core::registry::in_worker
Jul 09 00:20:17 testnode1 redisnode[2290449]:   21: rayon::iter::plumbing::bridge_producer_consumer::helper
Jul 09 00:20:17 testnode1 redisnode[2290449]:   22: rayon_core::join::join_context::{{closure}}
Jul 09 00:20:17 testnode1 redisnode[2290449]:   23: rayon_core::registry::in_worker
Jul 09 00:20:17 testnode1 redisnode[2290449]:   24: rayon::iter::plumbing::bridge_producer_consumer::helper
Jul 09 00:20:17 testnode1 redisnode[2290449]:   25: <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute
Jul 09 00:20:17 testnode1 redisnode[2290449]:   26: rayon_core::registry::WorkerThread::wait_until_cold
Jul 09 00:20:17 testnode1 redisnode[2290449]:   27: rayon_core::registry::ThreadBuilder::run
Jul 09 00:20:17 testnode1 redisnode[2290449]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Version (please complete the following information):

Additional context

Happens on archival testnet node as well.

bowenwang1996 commented 2 months ago

Note from @evgenykuzyakov: those two nodes that crashed are also indexers.

VanBarbascu commented 2 months ago

Hi @evgenykuzyakov, thanks for reporting this issue!

Your node is not a validator node so you don't need to enable memtries loading.

As per the release notes:

When Non-validator node operators, adopt the stateless validation release, they need to:

Change their nodes to meet hardware requirements (can be found below)
Set store.load_mem_tries_for_tracked_shards in config.json to false
bowenwang1996 commented 2 months ago

Closing as it doesn't seem to be a real issue