As a part of testing changes
[DB-2687 Make OOM killer prioritize PG backends with more memory], I observed that all the postgres backends got terminated even when the total memory usage was half of total available memory.
Test and instance details:
Cluster: 3 node/C5.Large instance
Sysbench oltp_read_only workload with 300 tables/25k rows per table with 30 threads running in parallel.
The total available memory on the node is 3.5G and the actual memory usage was only 1.6G(RSS). The issue here is when we have enough available free memory then why the postgres backend was terminated by signal 9 .
Postgres logs:
2022-06-22 18:20:59.883 UTC [18282] LOG: server process (PID 7916) was terminated by signal 9: Killed
2022-06-22 18:20:59.885 UTC [18282] LOG: terminating any other active server processes
2022-06-22 18:20:59.926 UTC [7913] WARNING: terminating connection because of crash of another server process
2022-06-22 18:20:59.926 UTC [7913] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2022-06-22 18:21:00.687 UTC [18282] LOG: all server processes terminated; reinitializing
2022-06-22 18:21:01.208 UTC [8982] LOG: database system was interrupted; last known up at 2022-06-22 17:18:34 UTC
2022-06-22 18:21:01.325 UTC [8982] LOG: database system was not properly shut down; automatic recovery in progress
2022-06-22 18:21:01.337 UTC [8982] LOG: redo starts at 0/1000108
2022-06-22 18:21:01.337 UTC [8982] LOG: invalid record length at 0/10001B0: wanted 24, got 0
2022-06-22 18:21:01.337 UTC [8982] LOG: redo done at 0/1000140
Dmesg log:
[Wed Jun 22 18:21:00 2022] Killed process 7916 (postgres), UID 997, total-vm:650712kB, anon-rss:299340kB, file-rss:0kB, shmem-rss:52kB
Below is the excerpt from the dmesg log about the OOM killer message dump
Jira Link: DB-2731
Description
As a part of testing changes [DB-2687 Make OOM killer prioritize PG backends with more memory], I observed that all the postgres backends got terminated even when the total memory usage was half of total available memory.
Test and instance details:
The total available memory on the node is 3.5G and the actual memory usage was only 1.6G(RSS). The issue here is when we have enough available free memory then why the postgres backend was terminated by signal 9 .
Postgres logs:
Dmesg log: [Wed Jun 22 18:21:00 2022] Killed process 7916 (postgres), UID 997, total-vm:650712kB, anon-rss:299340kB, file-rss:0kB, shmem-rss:52kB
Below is the excerpt from the dmesg log about the OOM killer message dump
OOM score of postgres/tserver/yb-master processes