kubernetes-sigs / aws-efs-csi-driver

CSI Driver for Amazon EFS https://aws.amazon.com/efs/
Apache License 2.0
693 stars 526 forks source link

[helm v3.0.1] "efs-csi-node" pod crashloopbackoff error on ARM node #1336

Closed sc-yan closed 2 months ago

sc-yan commented 2 months ago

/kind bug

What happened? "efs-csi-node" deamonset fail to start. What you expected to happen?

How to reproduce it (as minimally and precisely as possible)? the eks/k8s cluster I created have 2 node group, 1 is using amd node, and 1 is using arm node. it's ok to deploy on amd64 node, but fail to deploy on ARM node. that's the only difference I can tell. Anything else we need to know?:

Environment

Please also attach debug logs to help us better diagnose

liveness-probe W0501 00:06:34.236343       1 connection.go:234] Still connecting to unix:///csi/csi.sock                                                                                                                                                 
│ efs-plugin exec /bin/aws-efs-csi-driver: exec format error                                                                                                                                                                                               
│ csi-driver-registrar I0501 00:06:24.159796       1 main.go:135] Version: v2.10.0                                                                                                                                                                         
│ csi-driver-registrar I0501 00:06:24.159863       1 main.go:136] Running node-driver-registrar in mode=                                                                                                                                                   
│ csi-driver-registrar I0501 00:06:24.159872       1 main.go:157] Attempting to open a gRPC connection with: "/csi/csi.sock"                                                                                                                               
│ csi-driver-registrar W0501 00:06:34.160692       1 connection.go:234] Still connecting to unix:///csi/csi.sock                                                                                                                                           
│ Stream closed EOF for kube-system/efs-csi-node-jwx6j (efs-plugin)                                                                                                                                                                                        
│ csi-driver-registrar W0501 00:06:44.160006       1 connection.go:234] Still connecting to unix:///csi/csi.sock                                                                                                                                                                                                                                                                                  
│ csi-driver-registrar E0501 00:06:54.160847       1 main.go:160] error connecting to CSI driver: context deadline exceeded                                                                                                                                
│ liveness-probe W0501 00:06:54.236514       1 connection.go:234] Still connecting to unix:///csi/csi.sock                                                                                                                                                 
│ Stream closed EOF for kube-system/efs-csi-node-jwx6j (csi-driver-registrar)                                                                                                                                                                              
│ liveness-probe W0501 00:07:04.237080       1 connection.go:234] Still connecting to unix:///csi/csi.sock    
willthames commented 2 months ago

Looks like the efs-csi-driver container no longer supports arm (which to me seems highly ironic given that we're only running arm nodes because AWS are massive graviton advocates)

docker manifest inspect public.ecr.aws/efs-csi-driver/amazon/aws-efs-csi-driver:v2.0.1 shows a single architecture image whereas docker manifest inspect public.ecr.aws/efs-csi-driver/amazon/aws-efs-csi-driver:v1.7.7 shows a multi-architecture image

mskanth972 commented 2 months ago

Thanks for bringing here, we released a new version of Driver v2.0.2 which Fixed the Arm issue. Now the driver supports both arm and amd. Closing the issue, feel free to open if the issue still persists.

 docker manifest inspect public.ecr.aws/efs-csi-driver/amazon/aws-efs-csi-driver:v2.0.2
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1691,
         "digest": "sha256:5b3cf586f91d42613cded07d5ef70259256e8b2c5589619c9a718d1d774207cd",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 1691,
         "digest": "sha256:63c76d2bd3319e9f828a6e433e40ba345e6a7081d54c37948b08630db1e8dcde",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
}