kubernetes-sigs / aws-efs-csi-driver

CSI Driver for Amazon EFS https://aws.amazon.com/efs/
Apache License 2.0
711 stars 545 forks source link

aws-efs-csi-driver efs-proxy OOM killed #1383

Closed nan-coupa closed 2 months ago

nan-coupa commented 3 months ago

/kind bug

What happened?

efs-proxy OOM killed when the EKS node has plenty of memory.

What you expected to happen?

efs-proxy runs without issue.

How to reproduce it (as minimally and precisely as possible)?

Run container using EFS volume. The process efs-proxy gets OOM killed.

Anything else we need to know?:

Environment

Please also attach debug logs to help us better diagnose

[root@ip-10-0-12-209 /]# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        912M        8.1G        6.1M        6.4G         14G
Swap:            0B          0B          0B

[root@ip-10-0-12-209 /]# ps -ef | grep efs-proxy
root        8538    6216  0 17:50 ?        00:00:00 [efs-proxy] <defunct>
root      199374    6236  1 19:24 ?        00:00:00 [efs-proxy] <defunct>

/var/log/messages indicate process oom killed and any attempt to restart csi-driver result in the same:

Jun 21 19:22:58 ip-10-0-12-209 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-ca0c308b91e6712a9f57d9c4eb300cc8e3acb80a2d3f342f6c858d4998bbb945.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod50beeb08_b8d5_460a_92f7_15010a01cd30.slice/cri-containerd-ca0c308b91e6712a9f57d9c4eb300cc8e3acb80a2d3f342f6c858d4998bbb945.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod50beeb08_b8d5_460a_92f7_15010a01cd30.slice/cri-containerd-ca0c308b91e6712a9f57d9c4eb300cc8e3acb80a2d3f342f6c858d4998bbb945.scope,task=efs-proxy,pid=197121,uid=0
Jun 21 19:22:58 ip-10-0-12-209 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-ca0c308b91e6712a9f57d9c4eb300cc8e3acb80a2d3f342f6c858d4998bbb945.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod50beeb08_b8d5_460a_92f7_15010a01cd30.slice/cri-containerd-ca0c308b91e6712a9f57d9c4eb300cc8e3acb80a2d3f342f6c858d4998bbb945.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod50beeb08_b8d5_460a_92f7_15010a01cd30.slice/cri-containerd-ca0c308b91e6712a9f57d9c4eb300cc8e3acb80a2d3f342f6c858d4998bbb945.scope,task=efs-proxy,pid=197121,uid=0
Jun 21 19:22:58 ip-10-0-12-209 kernel: Memory cgroup out of memory: Killed process 197121 (efs-proxy) total-vm:266300kB, anon-rss:100688kB, file-rss:6336kB, shmem-rss:0kB, UID:0 pgtables:320kB oom_score_adj:-997
Jun 21 19:22:58 ip-10-0-12-209 kernel: Memory cgroup out of memory: Killed process 197121 (efs-proxy) total-vm:266300kB, anon-rss:100688kB, file-rss:6336kB, shmem-rss:0kB, UID:0 pgtables:320kB oom_score_adj:-997
seanzatzdev-amazon commented 3 months ago

Does the issue persist on v2.0.4 of the driver? Previous versions had an issue with zombie efs-proxy processes.

Otherwise, do you have any memory or cpu limits on your driver?

seanzatzdev-amazon commented 3 months ago

Can you also send your storage class,pvc, and pod files?

nan-coupa commented 2 months ago

Further testing shows this works with v2.0.4.