Open NikitaSkrynnik opened 2 weeks ago
We found that Go runtime in nsmgr
(probably in forwarder-vpp
too) is aware of the memory consumption shown by the metrics server in Kubernetes. Runtime can allocate big chunks of memory during highload tests but usually don't return them to the kernel. It's due to default memory management settings in Go.
Current memory consumption can be found in /sys/fs/cgroup/memory/memory.stat
in nsmgr
pod:
/ # cat /sys/fs/cgroup/memory/memory.stat
rss 129200128
...
inactive_anon 129175552
active_anon 12288
inactive_file 12288
active_file 0
...
As you can see memory consumption of nsmgr
is about 129mb. This is the value after 1 hour of highload testing with 90 clients and 10 endpoints. There are also some fields that tell about what this memory is. inactive_anon
field means memory that was used by nsmgr
but now it's not.
Here are also the values from go runtime in nsmgr
:
memStats.HeapReleased: 5
memStats.HeapIdle: 104
memStats.HeapInuse: 17
memStats.HeapSys: 121
HeapIdle tells that nsmgr
have 104mb of memory that is not used right now but may be useful in the future so Go runtime keeps it. And HeapReleased means the amount of memory that has been reclaimed by the kernel. So it means runtime is aware of that inactive
memory but keeps it for the future.
kubectl top
before testing:
NAME CPU(cores) MEMORY(bytes)
forwarder-vpp-cqr7k 75m 273Mi
nsmgr-phx2f 36m 12Mi
registry-k8s-7dd7f7d984-89p8h 1m 11Mi
Here is kubectl top
during highload testing:
NAME CPU(cores) MEMORY(bytes)
forwarder-vpp-8k7pf 378m 427Mi
nsmgr-khbhg 421m 162Mi
registry-k8s-7dd7f7d984-2wgj6 181m 57Mi
and kubectl top
after testing:
NAME CPU(cores) MEMORY(bytes)
forwarder-vpp-8k7pf 63m 342Mi
nsmgr-khbhg 421m 129Mi
registry-k8s-7dd7f7d984-2wgj6 3m 24Mi
nsmgr
memory consumtion doesn't decrease even after several hours.
The default behaviour can be changed with env var GODEBUG=madvdontneed=1
.
kubectl top
before testing:
NAME CPU(cores) MEMORY(bytes)
forwarder-vpp-cqr7k 71m 276Mi
nsmgr-phx2f 37m 11Mi
registry-k8s-7dd7f7d984-89p8h 1m 12Mi
kubectl top
during testing:
NAME CPU(cores) MEMORY(bytes)
forwarder-vpp-cqr7k 449m 402Mi
nsmgr-phx2f 439m 124Mi
registry-k8s-7dd7f7d984-89p8h 198m 54Mi
kubectl top
2min after testing:
NAME CPU(cores) MEMORY(bytes)
forwarder-vpp-cqr7k 66m 333Mi
nsmgr-phx2f 31m 40Mi
registry-k8s-7dd7f7d984-89p8h 2m 23Mi
As you can see the nsmgr
memory consumption has descreased 1min after the test is finished.
Go runtime memory stats:
memStats.HeapReleased: 96
memStats.HeapIdle: 99
memStats.HeapInuse: 18
memStats.HeapSys: 118
HeapReleased
is much bigger now meaning that 96mb of memory has been reclaimed by the kernel.
nsmgr
memory consumption is still 4 time bigger than on initial state: 40mb vs 11mb. Current memory consumtion reported by go runtime is 18mb (HeapInuse). There is also stack that consumes 10mb of memory(StackInuse):
memStats.HeapReleased: 96
memStats.HeapIdle: 99
memStats.HeapInuse: 18
memStats.HeapSys: 118
memStats.StackInuse: 10
memStats.StackSys: 10
So 28mb total. 40mb - 28mb = 12mb is still unknown. Part of the memory may be related to opened files because memory.stats from the kernel show:
inactive_file 6262784
active_file 2289664
Also the kernel shows that nsmgr
has several opened sockets:
/ # ls -l /proc/1/fd
total 0
lrwx------ 1 root root 64 Nov 12 11:57 0 -> /dev/null
l-wx------ 1 root root 64 Nov 12 11:57 1 -> pipe:[18439221]
lrwx------ 1 root root 64 Nov 12 12:25 10 -> socket:[19743162]
lrwx------ 1 root root 64 Nov 12 12:25 11 -> socket:[19743164]
l--------- 1 root root 64 Nov 12 12:25 12 -> /listen.on.sock
lrwx------ 1 root root 64 Nov 12 12:25 13 -> socket:[19729952]
lrwx------ 1 root root 64 Nov 12 12:25 14 -> socket:[19743165]
l-wx------ 1 root root 64 Nov 12 11:57 2 -> pipe:[18439222]
lr-x------ 1 root root 64 Nov 12 12:25 29 -> anon_inode:inotify
lrwx------ 1 root root 64 Nov 12 11:57 3 -> socket:[18437267]
lrwx------ 1 root root 64 Nov 12 12:25 32 -> anon_inode:[eventpoll]
lr-x------ 1 root root 64 Nov 12 12:25 33 -> pipe:[18605780]
l-wx------ 1 root root 64 Nov 12 12:25 34 -> pipe:[18605780]
lrwx------ 1 root root 64 Nov 12 11:57 4 -> anon_inode:[eventpoll]
lrwx------ 1 root root 64 Nov 12 11:57 5 -> anon_inode:[eventfd]
l--------- 1 root root 64 Nov 12 12:19 540 -> /listen.on.sock
lrwx------ 1 root root 64 Nov 12 11:57 6 -> socket:[18429898]
lrwx------ 1 root root 64 Nov 12 11:57 7 -> socket:[18434588]
lrwx------ 1 root root 64 Nov 12 11:57 8 -> socket:[18435791]
lrwx------ 1 root root 64 Nov 12 11:57 9 -> socket:[18433938]
Further investigation will be related to open sockets
.
Our internal tests and tests in the production environment doesn't show any memory leaks. The issues can be closed.
Description
After highload tests
nsmgr
consumes much more memory than expected. Memory and goroutine profiles doesn't show any leaks