Closed zjalicflw closed 1 year ago
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/214
This seems similar, however I have tried everything to solve this, no matter what I get the same error - context deadline exceeded
Facing this right now
Hi @debdutdeb
I managed to solve my issue by reinstalling both CoreDNS plugins and VPC CNI and EBS Driver. I updated them to a latest version. After this my kafka pods were running.
This should be easily fixed by uninstalling all addons, making sure to uninstall ones that are NOT installed through AWS addons console, install them all again and then delete some PVCs if stuck on attaching. Of course this will just work if you use dynamic provisioning. If using static, just attach and retattach volumes.
Taking a look at your PVCs, PVs, EBS volumes attached to your EKS clusters instance and carefully inspecting them should fix your problem.
You can elaborate more if you need help, I will try to do my best.
Filip
We are running into the same issue in an EKS environment.
Kubernetes version: v1.24.17-eks-4f4795d
Driver version: 1.24.0
(from helm chart version aws-ebs-csi-driver-2.24.0)
I1113 08:42:54.079730 1 csi_handler.go:251] Attaching "csi-57939a06730aa4167c1609c46f5d8a3f6196360670b974e355bf2f6cf01a746c"
I1113 08:42:54.079786 1 csi_handler.go:251] Attaching "csi-b394ecc409f06a620fbce7118bdf4db434e5f359196317f98a42cdcac85eacdb"
I1113 08:42:54.080160 1 controller.go:415] "ControllerPublishVolume: attaching" volumeID="vol-0934dc0da8301b04d" nodeID="i-0c8e24cd69c5ca516"
I1113 08:42:54.080160 1 controller.go:415] "ControllerPublishVolume: attaching" volumeID="vol-056e1e688e7a0aa8c" nodeID="i-0c8e24cd69c5ca516"
E1113 08:43:09.080470 1 driver.go:124] "GRPC error" err=<
rpc error: code = Internal desc = Could not attach volume "vol-056e1e688e7a0aa8c" to node "i-0c8e24cd69c5ca516": error listing AWS instances: RequestCanceled: request context canceled
caused by: context canceled
>
E1113 08:43:09.080469 1 driver.go:124] "GRPC error" err=<
rpc error: code = Internal desc = Could not attach volume "vol-0934dc0da8301b04d" to node "i-0c8e24cd69c5ca516": error listing AWS instances: RequestCanceled: request context canceled
caused by: context canceled
>
I1113 08:43:09.087184 1 csi_handler.go:234] Error processing "csi-b394ecc409f06a620fbce7118bdf4db434e5f359196317f98a42cdcac85eacdb": failed to attach: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I1113 08:43:09.089415 1 csi_handler.go:234] Error processing "csi-57939a06730aa4167c1609c46f5d8a3f6196360670b974e355bf2f6cf01a746c": failed to attach: rpc error: code = DeadlineExceeded desc = context deadline exceeded
I managed to solve my issue by reinstalling both CoreDNS plugins and VPC CNI and EBS Driver. ... This should be easily fixed by uninstalling all addons, making sure to uninstall ones that are NOT installed through AWS addons console, install them all again and then delete some PVCs if stuck on attaching. ...
These steps may be fine for one off cases, but this isn't feasible for our production environment. I would like to work towards a more durable fix in the ebs-csi-driver application.
@zjalicflw Can you reopen this issue?
Hi @j-land, as a first step, I recommend upgrading to the latest version of the driver, which sets a more sensible default timeout value for the external attacher. See our release notes here for more information: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/CHANGELOG.md#v1250.
Beyond that, If you are still running into issues, I'd recommend enabling SDK logs via the sdkDebugLog parameter to help provide further insight into networking or auth related issues. Feel free to open a new issue if you need any help.
@torredil That's helpful, I appreciate it! Hopefully upgrading does the trick, but I'll enable SDK logs to debug if not.
Does upgrading solved the problem? @j-land
/kind bug
What happened?
After uninstalling and installing bitnami/kafka Helm chart on my EKS cluster a couple of times due to some errors, a new blocking error occurred. Suddenly, all pods are in status ContainerCreating. Upon inspection, describe pod command displays:
Warning FailedAttachVolume 10s (x6 over 29s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-95a5209c-797c-49de-ae30-9def18935393" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
After this, today I tried to delete and recreate PVCs, but similar error happens when recreating PVCs:
Upon describing pod with csi drivers:
What you expected to happen?
CSI driver should reattach properly to volumes.
How to reproduce it (as minimally and precisely as possible)?
Not sure, very specific situation
Anything else we need to know?:
Is this some AWS quota block? Because of testing, I uninstalled and installed kafka chart many times, but each time there was no problem with PVCs, and then suddenly pod describe gives context deadline exceeded errors.
Environment
kubectl version
):v1.23.1-eksbuild.1