Closed eswolinsky3241 closed 1 month ago
@eswolinsky3241 Could you please provide DEBUG
level logs? https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/troubleshooting/README.md
Also, how many pods do you add when this issue starts occurring? If there is any other additional information about your cluster that may help us recreate the issue, please let me know.
@eswolinsky3241 Have you found the root cause or any solution to this issue ?
@seanzatzdev-amazon We are facing the same issue on our EKS cluster v1.27.7-eks-4f4795d
, we have seen this issue with v1.6.0
and v1.7.2
. We see this issue on 2 deployments (~6 pods in total) that use the same SC/PV/PVC to mount EFS volume. Let me know what other information would be helpful. I'm working on getting some debug logs fom efs-csi-driver. Thank you.
@sorind-broadsign Was never able to root cause it but at some point it just stopped happening without any change on my part. Haven’t seen the error in months.
@eswolinsky3241 Have you found the root cause or any solution to this issue ? @seanzatzdev-amazon We are facing the same issue on our EKS cluster
v1.27.7-eks-4f4795d
, we have seen this issue withv1.6.0
andv1.7.2
. We see this issue on 2 deployments (~6 pods in total) that use the same SC/PV/PVC to mount EFS volume. Let me know what other information would be helpful. I'm working on getting some debug logs fom efs-csi-driver. Thank you.
Hey @sorind-broadsign, did you have the opportunity to resolved it? I'm in the same situation.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/kind bug
What happened?
We use EKS to run a distributed task queue that uses the HPA to scale deployments based on the number of tasks in a Redis queue. The pods in these deployments run on an EC2 managed node group. Every pod in the deployment has the same EFS drive attached to access necessary files. We use the efs-csi-node Daemonset, which is managed by the Helm chart. Sometimes, we scale up to a lot of pods at once to accomodate a large number of jobs added to the queue. We have started to see this error appear on some of these pods:
Most of the pods start successfully, but the ones that do show this event are just stuck in a “ContainerCreating” status. We have tried increasing resource requests for the Daemonset, but that has not helped, and the efs-csi-driver container logs do not provide any helpful information. This has become a problem for us, because our deployments never scale to the level we need them to.
What you expected to happen?
All pods to start with the EFS-backed volume mounted
How to reproduce it (as minimally and precisely as possible)?
Anything else we need to know?:
Environment
kubectl version
): 1.24Please also attach debug logs to help us better diagnose