Open booleanbetrayal opened 9 months ago
We have the same issue with the latest version 1.7.1
We can reproduce the issue with following steps:
Result: the first deployed pod gets stuck in "Terminating" state and in the same node the efs-driver memory usage slowly increases until the pod is killed with OOM.
Additionally to the logs mentioned above we also see plenty of nfs: server 127.0.0.1 not responding, timed out
logs from dmesg
on the EC2 instance.
Any updates about this issues? We are facing the same issues.
Issue is still present in 1.7.4
We are facing the same issue. The bug seems to be in the stunnel packaged version used by efs-plugin and we could fix it using our own image. The most interesting part is that this works with the same stunnel version shipped in the official repo
I don't know if it's possible to add this fix in the original Dockerfile to fix this memory leak.
FROM public.ecr.aws/eks-distro-build-tooling/eks-distro-minimal-base-python-builder:3.9-al2 as stunnel_installer
ENV STUNNEL_VERSION="5.58"
RUN yum install -y tar gzip 1.5-10.amzn2.0.1 tar 1.26-35.amzn2.0.3 gcc 7.3.1-17.amzn2 make 1:3.82-24.amzn2 openssl-devel 1:1.0.2k-24.amzn2.0.11 && \
yum -y clean all && rm -rf /var/cache && \
curl -o stunnel-$STUNNEL_VERSION.tar.gz https://www.stunnel.org/archive/5.x/stunnel-$STUNNEL_VERSION.tar.gz && \
tar -zxvf stunnel-$STUNNEL_VERSION.tar.gz && \
cd stunnel-$STUNNEL_VERSION && \
./configure --prefix=/newroot/ && \
make && \
make install && \
mv /newroot/usr/bin/stunnel /newroot/usr/bin/stunnel5 && \
cd - && \
rm -rf stunnel-$STUNNEL_VERSION.tar.gz stunnel-$STUNNEL_VERSION
FROM amazon/aws-efs-csi-driver:v1.7.5
COPY --from=stunnel_installer /newroot /
This is the memory usage before and after using this custom image
Seeing similar behavior using 1.7.4. We see efs-plugin using 35mb and it slowly climbs over a few days to our limit of 350mb and gets oomkilled.
I still see this same behavior in 2.0.2
@sstarcher I've tried the steps others listed above using driver version v2.0.4 and am unable to recreate the issue, memory usage stays relatively flat. Can you describe your setup, how to reproduce the issue, and describe the problematic behavior that you have encountered?
Digging in I realized that we had a different version pinned and our version is not as new as I was expecting. I'll update to 2.0.4 or greater and try again.
What happened?
Possible regression of #474 ?
EKS Managed Add-On Amazon EFS CSI Driver v1.6.0-eksbuild.1 appears to have a memory leak in
efs-plugin
container:What you expected to happen?
Memory is reclaimed during normal operations.
How to reproduce it (as minimally and precisely as possible)?
Deploy a cluster with the latest version EKS Managed Add-On of Amazon EFS CSI Driver (defaults) and enable the use of EFS based PVCs.
Environment
kubectl version
):v1.27.4-eks-2d98532
1.6.0-eksbuild.1
Please also attach debug logs to help us better diagnose
(Full logs captured and available through direct request due to sensitive values)
efs-utils
has several lines such as the following:/kind bug