kubernetes-sigs / aws-efs-csi-driver

CSI Driver for Amazon EFS https://aws.amazon.com/efs/
Apache License 2.0
724 stars 555 forks source link

EFS CSIDriver usage in EKS Fargate #1494

Open gitcblinn opened 3 weeks ago

gitcblinn commented 3 weeks ago

Is your feature request related to a problem?/Why is this needed Setting up EKS Fargate for the first time, I was having a lot of trouble with provisioning PersistentVolumes and PersistentVolumeClaims. The documentation is a maze and very outdated, and so are the static_provisioning samples in this repository. I ended up trying to install the EFS CSI Driver Addon to my cluster (which yes, is due to ignorance on my end for not reading all of the notes in the documentation), and then deleting this after I realized that I made a mistake.

The above led to lots of issues ("driver name efs.csi.aws.com not found in the list of registered CSI drivers" and "timed out waiting for external-attacher of efs.csi.aws.com CSI driver to attach volume" errors when provisioning pods using the volume) as I had inadvertently deleted the CSIDriver object from my cluster. I've also commented on another issue in the eksctl repo about this, but deleting that object took me out of the range of any documentation being able to save me. I had to do the following:

  1. Installed the EFS CSI Driver addon again.
  2. Ran the "eksctl delete addon --name aws-efs-csi-driver --cluster --preserve" command.
  3. Deleted the unnecessary csi driver daemon with "kubectl delete deployment efs-csi-controller -n kube-system" (I may be missing some things here).
  4. Used the following to deploy my PVC:
    apiVersion: v1
    kind: PersistentVolume
    metadata:
    name: efs-pv-test
    spec:
    capacity:
    storage: 5Gi
    volumeMode: Filesystem
    accessModes:
    - ReadWriteOnce
    persistentVolumeReclaimPolicy: Retain
    storageClassName: ""
    csi:
    driver: efs.csi.aws.com
    volumeHandle: <EFS_ID>
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: efs-claim
    spec:
    accessModes:
    - ReadWriteOnce
    storageClassName: ""
    resources:
    requests:
      storage: 5Gi

    (Note that a new StorageClass should NOT be created, and the storage class name should be left blank). I also made sure to add permissions in my EFS security group to allow inbound requests on the NFS port from the Cluster security group of the EKS cluster.

/feature

Describe the solution you'd like in detail I would like to request that additional documentation and clearer warnings are added related to the use of this driver in Fargate. It would also be useful to have cleanup instructions in the case that someone makes the same mistake I did, as that would allow them to recover a lot more quickly.

Additionally, it would be very useful if the static_provisioning examples could be retested and updated to a working version, or at least if a separate version could be added for clusters using pods provisioned in Fargate.

Finally--and I'm not sure if this is possible but want to check--is there a way that the warnings emitted by the pods could be updated to point to a missing CSIDriver element? Anything that could be added in the driver to help this? Or is this something that would have to be added upstream?

Describe alternatives you've considered I posted in the eksctl repository asking if I should open a feature, it might be helpful to include a warning in the eksctl command to let users know they are deleting an element that might be preexisting in their cluster. However, this is only one method of doing this, as the awscli or AWS Console can also fall into this issue. For that reason, I think it would be best to address this in the repository itself.

Additional context Add any other context or screenshots about the feature request here.

Thanks for your help. I'm definitely grumpy about this but at least I got it working, hoping my suffering can at least be used to help prevent more suffering of others.