kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.98k stars 4.65k forks source link

ebs-csi crashing #16703

Closed zetaab closed 4 months ago

zetaab commented 4 months ago

/kind bug

updated cluster from 1.28.x -> 1.29.6 kubernetes using kops 1.29.2

However, after that new ebs-csi-node pod is in crashloopbackoff

ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:08.712297       1 main.go:135] Version: v2.10.0
ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:08.712491       1 main.go:136] Running node-driver-registrar in mode=
ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:08.712566       1 main.go:157] Attempting to open a gRPC connection with: "/csi/csi.sock"
ebs-csi-node-cx7tf ebs-plugin I0727 07:26:13.633684       1 ec2.go:40] "Retrieving EC2 instance identity metadata" regionFromSession="eu-west-3"
ebs-csi-node-cx7tf ebs-plugin I0727 07:26:13.633792       1 metadata.go:52] "failed to retrieve instance data from ec2 metadata; retrieving instance data from kubernetes api" err="could not get EC2 instance identity metadata: operation error ec2imds: GetInstanceIdentityDocument, canceled, context deadline exceeded"
ebs-csi-node-cx7tf ebs-plugin I0727 07:26:13.661989       1 metadata.go:55] "kubernetes api is available"
ebs-csi-node-cx7tf ebs-plugin I0727 07:26:13.670585       1 driver.go:68] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.30.0"
ebs-csi-node-cx7tf ebs-plugin I0727 07:26:13.670642       1 node.go:97] "regionFromSession Node service" region="eu-west-3"
ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:13.993258       1 main.go:164] Calling CSI driver to discover driver name
ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:13.996719       1 main.go:173] CSI driver name: "ebs.csi.aws.com"
ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:13.996848       1 node_register.go:55] Starting Registration Server at: /registration/ebs.csi.aws.com-reg.sock
ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:13.997059       1 node_register.go:64] Registration Server started at: /registration/ebs.csi.aws.com-reg.sock
ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:13.997388       1 node_register.go:88] Skipping HTTP server because endpoint is set to: ""
ebs-csi-node-cx7tf ebs-plugin E0727 07:26:14.701013       1 node.go:860] "Unexpected failure when attempting to remove node taint(s)" err="isAllocatableSet: driver not found on node i-0cfee9d0e60b79f39"
ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:14.746575       1 main.go:90] Received GetInfo call: &InfoRequest{}
ebs-csi-node-cx7tf node-driver-registrar I0727 07:26:14.806444       1 main.go:101] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
ebs-csi-node-cx7tf ebs-plugin I0727 07:26:15.214440       1 node.go:940] "CSINode Allocatable value is set" nodeName="i-0cfee9d0e60b79f39" count=26

I have added permissions that was missing in https://github.com/kubernetes/kops/issues/16702

zetaab commented 4 months ago

well.. recreating that one node solved the issue :o