Open ok-everstake opened 2 years ago
Validator mainnet node has been restarted two more times today (Dec 23). Available memory on the server - min 56% for the past 2 days.
Dec 23 15:13:46.459 INFO stats: #55825362 59GdowyAkXRarsvkXUfUZzJCDc6cA7bLum8ZZ9GspNpy V/63 34/26/40 peers ⬇ 682.8kiB/s ⬆ 1.4MiB/s 0.90 bps 289.36 Tgas/s CPU: 77%, Mem: 7.8 GiB
memory allocation of 897332 bytes failed
Dec 23 17:32:05.006 INFO stats: #55832121 8GxXvZzu9bB3H2o52xMZ7WTBkokP3egeRJqSLZ3nnZrD V/65 33/27/40 peers ⬇ 2.4MiB/s ⬆ 2.1MiB/s 0.80 bps 213.27 Tgas/s CPU: 88%, Mem: 6.8 GiB
memory allocation of 2528 bytes failed
@matklad is it possible that wasmer sometimes tries to allocate a lot of memory? It could have very well been something else though but just want to double check
This occurs every couple of days on multiple of our nodes:
Jan 03 19:54:49 near03 neard[28265]: thread '<unnamed>' panicked at 'TODO creating memory cannot fail: Region("System call failed: Cannot allocate memory (os error 12)")', runtime/near-vm-runner/src/wasmer2_runner.rs:56:14
We are bumping limits up to:
$ grep -e ^Limit /etc/systemd/system/neard.service
LimitDATA=32G
LimitRSS=32G
...we were at 20GB before, for 9 months or so.
cc @nagisa
This particular line indeed allocates a sizable chunk of memory -- 64megs. It's always 64 megs though, it's not variable. It's also serviced directly from mmap, bypassing the allocator. And we do this for every contract call, which happens rather often.
So, I am not surprised that this is the line that fails under memory exhaustion condition.
I would be surprised though if this line is to blame for relative memory increase: if we have a leak here, I'd expect it to be a really big problem which we'd notice immediately.
There are situations other than exhaustion of physical memory bytes that could cause any attempt to allocate return an ENOMEM. In particular, the failure modes are (paraphrasing the man pages for mmap
):
munmap
, when unmapping a region in the middle of an existing mapping, since this results in two smaller mappings on either side of the region being unmapped;RLIMIT_DATA
limit, described in getrlimit(2)
, would have been exceeded.Here specifically it is important to note that both of these limits pertain to virtual memory rather than resident memory. (As a side note RLIMIT_RSS
has no effect since Linux 2.4.30). So both of these failure modes are a plausible points of concern. The map count limit in particular is a limit on the number of map regions process can have and can be quite low by default:
$ sysctl vm.max_map_count
vm.max_map_count = 65530
So it is possible to receive this error after leaking just 250MiB worth of pages or so, which would not necessarily be very apparent using typical monitoring tools.
Now I cannot really tell for sure where we may be hitting these cases: the increase could be extremely quick to the point where none of the usual sampling-based mechanisms would catch the peak; debugging if we hold onto memory maps excessively would require a core dump after a OOM crash.
As a quick'n'easy check, is there any chance anybody experiencing this collects virtual memory utilization metrics over time and could share a slice of these metrics around the time of crash here?
As another check, is there any chance anybody experiencing this has coredumps enabled and a coredump of such an occurrence saved? Listing the mapping table for this coredump would be helpful.
@nagisa I have some graphs, not sure if this helps. For the last 15 days, falls around the time of crashes/restarts Memory utilization % by zabbix vm.memory.utilization Node used RAM by prometheus metrics
I'll increase vm.max_map_count limit for tests
@ok-everstake thanks for the graphs! It seems that your node either did not run out of memory or did so extremely quickly that prometheus is not able to catch it
Description: I have a similar problem in StakeWars right now, commit: https://github.com/near/nearcore/commit/8448ad1ebf27731a43397686103aa5277e7f2fcf cargo build -p neard --release --features shardnet The node is built and running inside the docker. The memory allocated for the container is 16GB, but it can be seen that the node uses about 3GB before the reboot. There was nothing like this on https://github.com/near/nearcore/releases/tag/1.28.0-rc.3
To Reproduce: Nothing special
Logs: In the logs at different intervals I see that the node is restarting. There were two different options, I give an example:
INFO stats: # 988989 52g39XebLjGs3TrrXZuGXutFEfdQecXwxA3GzFMh9RjB Validator | 97 validators 30 peers ⬇ 530 kB/s ⬆ 1.18 MB/s 0.70 bps 147 Tgas/s CPU: 49%, Mem: 3.33 GB
DEBUG stats: EpochId(`11111111111111111111111111111111`) Blocks in progress: 3 Chunks in progress: 0 Orphans: 0
thread '<unnamed>' panicked at 'creating memory must not fail: Region("System call failed: Cannot allocate memory (os error 12)")', runtime/near-vm-runner/src/wasmer2_runner.rs:53:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
INFO neard: version="trunk" build="crates-0.14.0-218-g8448ad1eb" latest_protocol=100
INFO db: Created a new RocksDB instance. num_instances=1
INFO db: Dropped a RocksDB instance. num_instances=0
INFO near: Opening RocksDB database path=/home/xxx/.near/data
INFO db: Created a new RocksDB instance. num_instances=1
INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=0 total_msg_received_count=0 max_max_record_num_messages_in_progress=0
INFO stats: # 988996 Waiting for peers 0 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 0%, Mem: 300 MB
DEBUG stats: EpochId(`11111111111111111111111111111111`) Blocks in progress: 0 Chunks in progress: 0 Orphans: 0
INFO stats: # 989004 6JVChTLQ9N3uDpp2WZZBSujGfZE8Vp1pqBhFduHboCGd Validator | 97 validators 19 peers ⬇ 271 kB/s ⬆ 504 kB/s 0.70 bps 184 Tgas/s CPU: 135%, Mem: 884 MB
INFO stats: # 988989 52g39XebLjGs3TrrXZuGXutFEfdQecXwxA3GzFMh9RjB Validator | 97 validators 30 peers ⬇ 530 kB/s ⬆ 1.18 MB/s 0.70 bps 147 Tgas/s CPU: 49%, Mem: 3.33 GB
DEBUG stats: EpochId(`11111111111111111111111111111111`) Blocks in progress: 3 Chunks in progress: 0 Orphans: 0
thread '<unnamed>' panicked at 'creating memory must not fail: Region("System call failed: Cannot allocate memory (os error 12)")', runtime/near-vm-runner/src/wasmer2_runner.rs:53:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
INFO neard: version="trunk" build="crates-0.14.0-218-g8448ad1eb" latest_protocol=100
INFO db: Created a new RocksDB instance. num_instances=1
INFO db: Dropped a RocksDB instance. num_instances=0
INFO near: Opening RocksDB database path=/home/xxx/.near/data
INFO db: Created a new RocksDB instance. num_instances=1
INFO near_network::peer_manager::peer_manager_actor: Bandwidth stats total_bandwidth_used_by_all_peers=0 total_msg_received_count=0 max_max_record_num_messages_in_progress=0
INFO stats: # 988996 Waiting for peers 0 peers ⬇ 0 B/s ⬆ 0 B/s 0.00 bps 0 gas/s CPU: 0%, Mem: 300 MB
DEBUG stats: EpochId(`11111111111111111111111111111111`) Blocks in progress: 0 Chunks in progress: 0 Orphans: 0
INFO stats: # 989004 6JVChTLQ9N3uDpp2WZZBSujGfZE8Vp1pqBhFduHboCGd Validator | 97 validators 19 peers ⬇ 271 kB/s ⬆ 504 kB/s 0.70 bps 184 Tgas/s CPU: 135%, Mem: 884 MB
Version: /nearcore/target/release/neard --version neard (release trunk) (build crates-0.14.0-218-g8448ad1eb) (rustc 1.62.0) (protocol 100) (db 31)
Additional context:
We had that on mainnet
yesterday:
Jul 19 13:17:03 near01 neard[431910]: 2022-07-19T13:17:03.924068Z INFO stats: #70188894 E4Xa2tcnWGFZjmkc53sQ8BFd74SwsdjvLbRaowg1dTa4 Validator | 100 validators 30 peers ⬇ 3.64 MB/s ⬆ 18.9 MB/s 0.70 bps 240 Tgas/s CPU: 167%, Mem: 9.78 GB
Jul 19 13:17:12 near01 neard[431910]: thread '<unnamed>' panicked at 'creating memory must not fail: Region("System call failed: Cannot allocate memory (os error 12)")', runtime/near-vm-runner/src/wasmer2_runner.rs:53:14
Jul 19 13:17:12 near01 neard[431910]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Jul 19 13:17:13 near01 systemd[1]: neard.service: Main process exited, code=killed, status=6/ABRT
Jul 19 13:17:13 near01 systemd[1]: neard.service: Failed with result 'signal'.
Jul 19 13:17:18 near01 systemd[1]: neard.service: Scheduled restart job, restart counter is at 2.
Jul 19 13:17:18 near01 systemd[1]: Stopped NEAR daemon.
Jul 19 13:17:18 near01 systemd[1]: Starting NEAR daemon...
...
root@near01 ~ # prlimit -p `pgrep -x neard`
RESOURCE DESCRIPTION SOFT HARD UNITS
AS address space limit unlimited unlimited bytes
CORE max core file size 0 unlimited bytes
CPU CPU time unlimited unlimited seconds
DATA max data size 34359738368 34359738368 bytes
FSIZE max file size unlimited unlimited bytes
LOCKS max number of file locks held unlimited unlimited locks
MEMLOCK max locked-in-memory address space 65536 65536 bytes
MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes
NICE max nice prio allowed to raise 0 0
NOFILE max number of open files 50000 50000 files
NPROC max number of processes 256754 256754 processes
RSS max resident set size unlimited unlimited bytes
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasks unlimited unlimited microsecs
SIGPENDING max number of pending signals 256754 256754 signals
STACK max stack size 8388608 unlimited bytes
Clarification for my case: https://github.com/near/nearcore/issues/5931#issuecomment-1189079138 All this happened after the hard fork, after the end of the first epoch, this problem disappeared. Neard stopped restarting.
This problem still occurs randomly:
Aug 28 04:41:22 near02 neard[717073]: memory allocation of 2074607616 bytes failed
Aug 28 04:41:23 near02 systemd[1]: neard.service: Main process exited, code=killed, status=6/ABRT
This was on an RPC / fullnode, not on a validator node:
$ curl -s localhost:3030/status | jq -r .validator_account_id
null
I didn't notice what was at the time in near logs, but I see this in dmesg -T
:
[Mon Aug 22 07:16:31 2022] mmap: actix-rt|system (2898): VmData 34575671296 exceed data ulimit 34359738368. Update limits or use boot option ignore_rlimit_data.
Ah, please ignore previous comment, stray LimitDATA=32G
in systemd unit file. Need to remove it.
Description Validator mainnet node has been restarted two times with
memory allocation failed
issue. Node memory usage for the past 2 daysavg 5,78 GB & max 7,6 GB
. 64 GB on the server.To Reproduce Nothing special
Expected behavior Validator node works fine without unpredictable restarts
Logs Error on the first restart
Dec 20 21:27:31.391 INFO stats: #55625082 4CQ95gS4Ru9vZ8JyQR4YHkab26X6cq4gC9UHe5oEa4dT V/59 34/33/40 peers
⬇
2.3MiB/s
⬆
2.0MiB/s 0.80 bps 74.19 Tgas/s CPU: 111%, Mem: 6.7 GiB memory allocation of 3584 bytes failed
Error on the second restart
Dec 21 12:22:02.268 INFO stats: #55670205 9WvRTCzVpP6n7GxFGVyLZdJN5cJyZGYWwVaNHuK1EMYS V/52 34/30/40 peers
⬇
828.6kiB/s
⬆
2.1MiB/s 0.90 bps 72.72 Tgas/s CPU: 47%, Mem: 7.1 GiB thread '' panicked at 'TODO creating memory cannot fail: Region("System call failed: Cannot allocate memory (os error 12)")', runtime/near-vm-runner/src/wasmer2_runner.rs:56:14 note: run with
RUST_BACKTRACE=1
environment variable to display a backtrace Dec 21 12:22:04.278 ERROR network: Received error sending message to client: Mailbox has closed for Some(ed25519:Faoro1uDYXH2AFBA1eG4C5pQaLrk4vhWu6KW2pXpnaD8@142.132.143.229:24567) memory allocation of memory allocation of 34563456 bytes failed bytes failedVersion:
mainnet validator node
Additional context I wouldn't have even noticed it if we didn't have a trigger on restart. No missed blocks or any other triggered problems.