opentensor / subtensor

Bittensor Blockchain Layer
The Unlicense
123 stars 122 forks source link

Allocator ran out of space issue #548

Open NoFeeding opened 2 weeks ago

NoFeeding commented 2 weeks ago

Describe the bug

Starting archive mainnet node v1.1.2 from scratch with both Docker compose and source code but failed with

'Failed to allocate memory: "Allocator ran out of space".

Need help to check if I am doing anything wrong or not using the correct parameters after node-subtensor binary.

To Reproduce

  1. Go to subtensor directory which was git cloned from github
  2. Start the node with either ./target/release/node-subtensor --chain raw_spec.json --base-path /home/ubuntu/bittensor --sync=full --execution wasm --wasm-execution compiled --port 30333 --max-runtime-instances 64 --rpc-max-response-size 2048 --rpc-cors all --rpc-port 9933 --bootnodes /ip4/13.58.175.193/tcp/30333/p2p/12D3KooWDe7g2JbNETiKypcKT1KsCEZJbTzEHCn8hpd4PHZ6pdz5 --no-mdns --in-peers 8000 --out-peers 8000 --prometheus-external --rpc-external --pruning archive or docker compose up
  3. Subtensor node start
  4. Node not syncing instead repeating following message Thread 'tokio-runtime-worker' panicked at 'Failed to allocate memory: "Allocator ran out of space"', /home/ubuntu/.cargo/git/checkouts/substrate-7e08433d4c370a21/948fbd2/primitives/io/src/lib.rs:1451

Expected behavior

Started node syncing with peers to latest height.

Screenshots

2024-06-19 01:23:17 Subtensor Node
2024-06-19 01:23:17 ✌️ version 4.0.0-dev-40a8321a3b5
2024-06-19 01:23:17 ❤️ by Substrate DevHub https://github.com/substrate-developer-hub, 2017-2024
2024-06-19 01:23:17 📋 Chain specification: Bittensor
2024-06-19 01:23:17 🏷 Node name: labored-eyes-8887
2024-06-19 01:23:17 👤 Role: FULL
2024-06-19 01:23:17 💾 Database: RocksDb at /tmp/bittensor/blockchain/chains/bittensor/db/full
2024-06-19 01:23:18 🏷 Local node identity is: 12D3KooWEA7s14LwzWXCwWNBEFZzSTi9HGDsYHdH8opNSqu5vHUY
2024-06-19 01:23:18 💻 Operating system: linux
2024-06-19 01:23:18 💻 CPU architecture: x86_64
2024-06-19 01:23:18 💻 Target environment: gnu
2024-06-19 01:23:18 💻 CPU: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
2024-06-19 01:23:18 💻 CPU cores: 4
2024-06-19 01:23:18 💻 Memory: 63495MB
2024-06-19 01:23:18 💻 Kernel: 6.5.0-1020-aws
2024-06-19 01:23:18 💻 Linux distribution: Ubuntu 22.04.4 LTS
2024-06-19 01:23:18 💻 Virtual machine: yes
2024-06-19 01:23:18 📦 Highest known block at #94
2024-06-19 01:23:18 〽️ Prometheus exporter started at 0.0.0.0:9615
2024-06-19 01:23:18 Running JSON-RPC server: addr=0.0.0.0:9933, allowed origins=["*"]
2024-06-19 01:23:19 🔍 Discovered new external address for our node: /ip4/xx.xx.xxx.xx/tcp/30333/ws/p2p/12D3KooWEA7s14LwzWXCwWNBEFZzSTi9HGDsYHdH8opNSqu5vHUY Version: 4.0.0-dev-40a8321a3b5

0: sp_panic_handler::set::{{closure}} 1: std::panicking::rust_panic_with_hook 2: std::panicking::begin_panic_handler::{{closure}} 3: std::sys_common::backtrace::rust_end_short_backtrace 4: rust_begin_unwind 5: core::panicking::panic_fmt 6: core::result::unwrap_failed 7: tracing::span::Span::in_scope 8: sp_io::allocator::ExtAllocatorMallocVersion1::call 9: ::with_function_context 10: std::panicking::try 11: <F as wasmtime::func::IntoFunc<T,(wasmtime::func::Caller,A1),R>>::into_func::wasm_to_host_shim 12: 13: 14: 15: 16: 17: 18: 19: 20: wasmtime_runtime::traphandlers::catch_traps::call_closure 21: wasmtime_setjmp 22: wasmtime_runtime::traphandlers::::with 23: wasmtime::func::invoke_wasm_and_catch_traps 24: wasmtime::func::typed::TypedFunc<Params,Results>::call 25: sc_executor_wasmtime::instance_wrapper::EntryPoint::call 26: sc_executor_wasmtime::runtime::perform_call 27: ::call_with_allocation_stats 28: sc_executor_common::wasm_runtime::WasmInstance::call_export 29: environmental::using 30: sc_executor::executor::WasmExecutor::with_instance::{{closure}} 31: sc_executor::wasm_runtime::RuntimeCache::with_instance 32: <sc_executor::executor::NativeElseWasmExecutor as sp_core::traits::CodeExecutor>::call 33: sp_state_machine::execution::StateMachine<B,H,Exec>::execute 34: <sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt>::call_api_at 35: <node_subtensor_runtime::RuntimeApiImpl<SrApiBlock,RuntimeApiImplCall> as sp_api::Core<__SrApiBlock__>>::__runtime_api_internal_call_api_at::{{closure}} 36: <node_subtensor_runtime::RuntimeApiImpl<SrApiBlock,RuntimeApiImplCall> as subtensor_custom_rpc_runtime_api::SubnetInfoRuntimeApi<SrApiBlock>>::runtime_api_internal_call_api_at 37: sp_api::Core::execute_block 38: <&sc_service::client::client::Client<B,E,Block,RA> as sc_consensus::block_import::BlockImport>::import_block::{{closure}} 39: <sc_consensus_grandpa::import::GrandpaBlockImport<BE,Block,Client,SC> as sc_consensus::block_import::BlockImport>::import_block::{{closure}} 40: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport+Transaction = Transaction+Error = sp_consensus::error::Error+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport>::import_block::{{closure}} 41: futures_util::future::future::FutureExt::poll_unpin 42: sc_consensus::import_queue::basic_queue::BlockImportWorker::new::{{closure}} 43: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll 44: <sc_service::task_manager::prometheus_future::PrometheusFuture as core::future::future::Future>::poll 45: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll 46: <tracing_futures::Instrumented as core::future::future::Future>::poll 47: tokio::runtime::park::CachedParkThread::block_on 48: <tokio::runtime::blocking::task::BlockingTask as core::future::future::Future>::poll 49: tokio::runtime::task::core::Core<T,S>::poll 50: tokio::runtime::task::harness::Harness<T,S>::poll 51: tokio::runtime::blocking::pool::Inner::run 52: std::sys_common::backtrace::__rust_begin_short_backtrace 53: core::ops::function::FnOnce::call_once{{vtable.shim}} 54: std::sys::pal::unix::thread::Thread::new::thread_start 55: 56:

Thread 'tokio-runtime-worker' panicked at 'Failed to allocate memory: "Allocator ran out of space"', /home/ubuntu/.cargo/git/checkouts/substrate-7e08433d4c370a21/948fbd2/primitives/io/src/lib.rs:1451

This is a bug. Please report it at:

support.anonymous.an

2024-06-19 01:23:19 Block prepare storage changes error: Error at calling runtime api: Execution failed: Execution aborted due to trap: host code panicked while being called by the runtime: Failed to allocate memory: "Allocator ran out of space" WASM backtrace: error while executing at wasm backtrace: 0: 0xf9561 - !sp_io::allocator::extern_host_function_impls::malloc::h2f37d4f126906687 1: 0xeadb5 - !alloc::vec::from_elem::h9b139e829299baa7 2: 0xe9cc7 - !pallet_subtensor::math::weighted_median_col_sparse::h80861fc1d27573d8 3: 0x23a15 - !pallet_subtensor::epoch::<impl pallet_subtensor::pallet::Pallet>::epoch::h33632e30662bfac6 4: 0x43871 - !<pallet_subtensor::pallet::Pallet as frame_support::traits::hooks::OnInitialize<::BlockNumber>>::on_initialize::h32da1dcea74e2563 5: 0x80276 - !frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::initialize_block::hb0e7bcc94f70d752 6: 0x7f512 - !frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::execute_block::h30085b7a73268285 7: 0xc8d5e - !Core_execute_block
2024-06-19 01:23:19 💔 Error importing block 0xbdd3dcaa5f21f2d62588656cca5df54bd023785379844bf967e188c806cfbeb5: consensus error: Import failed: Import failed: Error at calling runtime api: Execution failed: Execution aborted due to trap: host code panicked while being called by the runtime: Failed to allocate memory: "Allocator ran out of space" WASM backtrace: error while executing at wasm backtrace: 0: 0xf9561 - !sp_io::allocator::extern_host_function_impls::malloc::h2f37d4f126906687 1: 0xeadb5 - !alloc::vec::from_elem::h9b139e829299baa7 2: 0xe9cc7 - !pallet_subtensor::math::weighted_median_col_sparse::h80861fc1d27573d8 3: 0x23a15 - !pallet_subtensor::epoch::<impl pallet_subtensor::pallet::Pallet>::epoch::h33632e30662bfac6 4: 0x43871 - !<pallet_subtensor::pallet::Pallet as frame_support::traits::hooks::OnInitialize<::BlockNumber>>::on_initialize::h32da1dcea74e2563 5: 0x80276 - !frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::initialize_block::hb0e7bcc94f70d752 6: 0x7f512 - !frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::execute_block::h30085b7a73268285 7: 0xc8d5e - !Core_execute_block
2024-06-19 01:23:19 💔 Error importing block 0x8eb9a96acb191c4d1e4a7e4ac7e5306c13fce7ea62c9ba54642620973b8309ce: block has an unknown parent
2024-06-19 01:23:19 💔 Error importing block 0x6e00254420c6019b4afc72c8bd91e8c48205fd838ee939b01081c9d866cf9e08: block has an unknown parent

Environment

Linux Ubuntu 22.04.4 LTS (GNU/Linux 6.5.0-1020-aws x86_64)

Additional context

No response

sam0x17 commented 2 weeks ago

Thanks for reaching out!

Currently there are some known issues syncing new archive nodes from genesis (involving the specific block referenced in your error) that for now require manually importing from a rocksdb snapshot until we can come up with a less manual solution. We don't publicly post the snapshots just yet but if you reach out to us on discord we can get the snapshot to you. Public snapshot URLs are also coming soon.

orriin commented 1 week ago

Hey @NoFeeding, can you try build your node from the commit on https://github.com/opentensor/subtensor/pull/561 and let me know if it resolves the issue for you?

NoFeeding commented 1 week ago

hi @orriin I am able to start the node by source code. Can I know if this change applied to any docker image already?

orriin commented 1 week ago

Great to hear @NoFeeding. It will be included in the next Docker image release, if you can't wait you can also build the Docker image right from my branch.