Recently we met some OOM issues on node-local-dns with the following backtrace:
[Fri Jan 26 05:46:44 2024] node-cache invoked oom-killer: gfp_mask=0xc40(GFP_NOFS), order=0, oom_score_adj=-997
[Fri Jan 26 05:46:44 2024] CPU: 44 PID: 3939051 Comm: node-cache Tainted: P OE 5.4.0-156-generic #173-Ubuntu
[Fri Jan 26 05:46:44 2024] Hardware name: Supermicro SYS-4029GP-TRT2/X11DPG-OT-CPU, BIOS 3.8b 01/17/2023
[Fri Jan 26 05:46:44 2024] Call Trace:
[Fri Jan 26 05:46:44 2024] dump_stack+0x6d/0x8b
[Fri Jan 26 05:46:44 2024] dump_header+0x4f/0x1eb
[Fri Jan 26 05:46:44 2024] oom_kill_process.cold+0xb/0x10
[Fri Jan 26 05:46:44 2024] out_of_memory+0x1cf/0x500
[Fri Jan 26 05:46:44 2024] mem_cgroup_out_of_memory+0xbd/0xe0
[Fri Jan 26 05:46:44 2024] try_charge+0x77c/0x810
[Fri Jan 26 05:46:44 2024] mem_cgroup_try_charge+0x71/0x190
[Fri Jan 26 05:46:44 2024] __add_to_page_cache_locked+0x2ff/0x3f0
[Fri Jan 26 05:46:44 2024] ? bio_add_page+0x6a/0x90
[Fri Jan 26 05:46:44 2024] ? scan_shadow_nodes+0x30/0x30
[Fri Jan 26 05:46:44 2024] add_to_page_cache_lru+0x4d/0xd0
[Fri Jan 26 05:46:44 2024] iomap_readpages_actor+0xf8/0x220
[Fri Jan 26 05:46:44 2024] iomap_apply+0xd5/0x160
[Fri Jan 26 05:46:44 2024] ? iomap_page_mkwrite_actor+0x80/0x80
[Fri Jan 26 05:46:44 2024] iomap_readpages+0xa3/0x190
[Fri Jan 26 05:46:44 2024] ? iomap_page_mkwrite_actor+0x80/0x80
[Fri Jan 26 05:46:44 2024] xfs_vm_readpages+0x35/0x90 [xfs]
[Fri Jan 26 05:46:44 2024] read_pages+0x71/0x1a0
[Fri Jan 26 05:46:44 2024] __do_page_cache_readahead+0x180/0x1a0
[Fri Jan 26 05:46:44 2024] filemap_fault+0x697/0xa50
[Fri Jan 26 05:46:44 2024] ? xas_load+0xd/0x80
[Fri Jan 26 05:46:44 2024] ? _cond_resched+0x19/0x30
[Fri Jan 26 05:46:44 2024] ? down_read+0x13/0xa0
[Fri Jan 26 05:46:44 2024] __xfs_filemap_fault+0x6c/0x200 [xfs]
[Fri Jan 26 05:46:44 2024] xfs_filemap_fault+0x37/0x40 [xfs]
[Fri Jan 26 05:46:44 2024] __do_fault+0x3c/0x170
[Fri Jan 26 05:46:44 2024] do_fault+0x24b/0x640
[Fri Jan 26 05:46:44 2024] __handle_mm_fault+0x4c5/0x7a0
[Fri Jan 26 05:46:44 2024] handle_mm_fault+0xca/0x200
[Fri Jan 26 05:46:44 2024] do_user_addr_fault+0x1f9/0x450
[Fri Jan 26 05:46:44 2024] __do_page_fault+0x58/0x90
[Fri Jan 26 05:46:44 2024] do_page_fault+0x2c/0xe0
[Fri Jan 26 05:46:44 2024] page_fault+0x34/0x40
[Fri Jan 26 05:46:44 2024] RIP: 0033:0x438950
[Fri Jan 26 05:46:44 2024] Code: Bad RIP value.
[Fri Jan 26 05:46:44 2024] RSP: 002b:000000c000123f28 EFLAGS: 00010212
[Fri Jan 26 05:46:44 2024] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000045da3d
[Fri Jan 26 05:46:44 2024] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000c000123f08
[Fri Jan 26 05:46:44 2024] RBP: 000000c000123f20 R08: 0000000000000000 R09: 0000000000000000
[Fri Jan 26 05:46:44 2024] R10: 00007fff43b2a090 R11: 0000000000000202 R12: 0000000000430e30
[Fri Jan 26 05:46:44 2024] R13: 0000000000000011 R14: 00000000019b4b78 R15: 0000000000000000
[Fri Jan 26 05:46:44 2024] memory: usage 399360kB, limit 399360kB, failcnt 1796473
[Fri Jan 26 05:46:44 2024] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Fri Jan 26 05:46:44 2024] kmem: usage 391164kB, limit 9007199254740988kB, failcnt 0
[Fri Jan 26 05:46:44 2024] Memory cgroup stats for /kubepods/podd9f57d24-67fd-4fdd-924b-780799ce4ba4:
[Fri Jan 26 05:46:44 2024] anon 0
file 13606912
kernel_stack 4276224
slab 325726208
sock 0
shmem 0
file_mapped 5947392
file_dirty 0
file_writeback 0
anon_thp 0
inactive_anon 0
active_anon 1826816
inactive_file 319488
active_file 6451200
unevictable 0
slab_reclaimable 23388160
slab_unreclaimable 302338048
pgfault 18332047
pgmajfault 344907
workingset_refault 6429196
workingset_activate 1211603
workingset_nodereclaim 33
pgrefill 3920017
pgscan 10044063
pgsteal 6473127
pgactivate 1536300
pgdeactivate 2788986
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 5
thp_collapse_alloc 0
[Fri Jan 26 05:46:44 2024] Tasks state (memory values in pages):
[Fri Jan 26 05:46:44 2024] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Fri Jan 26 05:46:44 2024] [ 105029] 65535 105029 241 1 28672 0 -998 pause
[Fri Jan 26 05:46:44 2024] [3939010] 0 3939010 34992 1783 147456 0 -997 node-cache
[Fri Jan 26 05:46:44 2024] [3939926] 0 3939926 3986 28 73728 0 -997 iptables
[Fri Jan 26 05:46:44 2024] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=5581bad107d4d1f42a8659b6620f8eafc5f8bd6861f1456c2db1f1bd3bf4fae7,mems_allowed=0-1,oom_memcg=/kubepods/podd9f57d24-67fd-4fdd-924b-780799ce4ba4,task_memcg=/kubepods/podd9f57d24-67fd-4fdd-924b-780799ce4ba4/5581bad107d4d1f42a8659b6620f8eafc5f8bd6861f1456c2db1f1bd3bf4fae7,task=node-cache,pid=3939010,uid=0
[Fri Jan 26 05:46:44 2024] Memory cgroup out of memory: Killed process 3939010 (node-cache) total-vm:139968kB, anon-rss:7232kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:144kB oom_score_adj:-997
In our node-local-dns usage, we set the memory limit to be ~390Mi, and it seems in this OOM, most of the memory is consumed by slab memory: 325726208, mentioned in the above log. Can anyone explain how the slab memory consumption is accumulated and how to estimate the limit properly?
Any insight will be helpful. We are using k8s-dns-node-cache:1.15.10, and we leave mostly the config default, like number of concurrent queries, etc.
Hi all,
Recently we met some OOM issues on node-local-dns with the following backtrace:
In our node-local-dns usage, we set the memory limit to be ~390Mi, and it seems in this OOM, most of the memory is consumed by slab memory: 325726208, mentioned in the above log. Can anyone explain how the slab memory consumption is accumulated and how to estimate the limit properly?
Any insight will be helpful. We are using
k8s-dns-node-cache:1.15.10
, and we leave mostly the config default, like number of concurrent queries, etc.