near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.32k stars 622 forks source link

Mainnet state sync can use more than 32GB of RAM #12281

Open jancionear opened 2 hours ago

jancionear commented 2 hours ago

Description

I tried to start a mainnet node (neard 2.2.1) using an old snapshot (snapshot from 2024-10-17, the node was started on 2024-10-22). The node connected to the network, performed header sync, and then I think it started state sync. Memory usage grew steadily until it exceeded 32 GB and the VM crashed. image

After restarting the VM, the node didn't pick back up. It's stuck with the following error:

2024-10-22T15:18:17.665621Z  INFO db: Opened a new RocksDB instance. num_instances=1
thread 'main' panicked at chain/client/src/client_actor.rs:168:6:
called `Result::unwrap()` on an `Err` value: Chain(DBNotFoundErr("epoch block: 8aQrTFn6KT4yYmL4zfL7qsjdpeUWb6x1zkYhBChS3AyQ"))
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: nearcore::start_with_config_and_synchronization
   4: neard::cli::RunCmd::run::{{closure}}
   5: tokio::task::local::LocalSet::run_until::{{closure}}
   6: neard::cli::NeardCmd::parse_and_run
   7: neard::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

In the hardware requirements docs we recommend RPC node operators to have 32GB of memory, and we also say that 24 GB should be enough, but it looks like the node can use more than that, and when it runs out of memory it corrupts the database :/

/cc @robin-near @VanBarbascu

walnut-the-cat commented 2 hours ago

cc. @saketh-are as well since this may be something we need to work on as a part of state sync follow up.