Mainnet state sync can use more than 32GB of RAM

Description

I tried to start a mainnet node (neard 2.2.1) using an old snapshot (snapshot from 2024-10-17, the node was started on 2024-10-22). The node connected to the network, performed header sync, and then I think it started state sync. Memory usage grew steadily until it exceeded 32 GB and the VM crashed.

After restarting the VM, the node didn't pick back up. It's stuck with the following error:

2024-10-22T15:18:17.665621Z  INFO db: Opened a new RocksDB instance. num_instances=1
thread 'main' panicked at chain/client/src/client_actor.rs:168:6:
called `Result::unwrap()` on an `Err` value: Chain(DBNotFoundErr("epoch block: 8aQrTFn6KT4yYmL4zfL7qsjdpeUWb6x1zkYhBChS3AyQ"))
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: nearcore::start_with_config_and_synchronization
   4: neard::cli::RunCmd::run::{{closure}}
   5: tokio::task::local::LocalSet::run_until::{{closure}}
   6: neard::cli::NeardCmd::parse_and_run
   7: neard::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

In the hardware requirements docs we recommend RPC node operators to have 32GB of memory, and we also say that 24 GB should be enough, but it looks like the node can use more than that, and when it runs out of memory it corrupts the database :/

/cc @robin-near @VanBarbascu

near / nearcore

Mainnet state sync can use more than 32GB of RAM #12281

Description