kubernetes-sigs / gcp-filestore-csi-driver

The Google Cloud Filestore Container Storage Interface (CSI) Plugin.
Apache License 2.0
90 stars 73 forks source link

driver provisions volumes regardless tag issues leaving pvc in pending state #942

Open RomanBednar opened 3 months ago

RomanBednar commented 3 months ago

Problem:
When configuring a Filestore CSI StorageClass with resource-tags that do not exist in Google Cloud Platform (GCP), PersistentVolumeClaims (PVCs) using this storage class remain in a Pending state. The driver attempts to create the Filestore instance but fails due to the non-existing tags, logging GRPC errors. Despite the failure, the Filestore instance is still created in GCP. However, the driver cannot delete the instance later due to the same issue with the tags, requiring manual intervention to remove the orphaned instance.

Additional Note:
This issue can also occur if the tags exist, but the CSI driver lacks sufficient permissions to use them.

Version: v1.6.16

Storage class configured with resource-tags parameter - the tags do not exist in GCP:

$ oc get sc/filestore-csi -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2024-08-20T13:07:50Z"
  name: filestore-csi
  resourceVersion: "99206"
  uid: 1c2e6395-2dfc-42b8-94fc-2c201dffe380
parameters:
  connect-mode: DIRECT_PEERING
  network: XXXXX-XXXXX-XXXXX
  resource-tags: parent1/tagKey1/tagValue1,parent2/tagKey2/tagValue2
provisioner: filestore.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

PVC is pending:

$ oc describe pvc/pvc-1-tagged
Name:          pvc-1-tagged
Namespace:     default
StorageClass:  filestore-csi
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: filestore.csi.storage.gke.io
               volume.kubernetes.io/storage-provisioner: filestore.csi.storage.gke.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type     Reason                Age                  From                                                                                                   Message
  ----     ------                ----                 ----                                                                                                   -------
  Warning  ProvisioningFailed    32m (x13 over 39m)   filestore.csi.storage.gke.io_rbednar-mycluster-01-sspvn-master-2_90c2218e-5688-4a51-9cd4-2ddc4f27c7fd  failed to provision volume with StorageClass "filestore-csi": rpc error: code = Unavailable desc = [parent1/tagKey1/tagValue1 parent2/tagKey2/tagValue2] tag(s) provided in pvc-b997cc10-b8ea-40e1-b999-82c15d5fe8fe create request does not exist
  Normal   ExternalProvisioning  66s (x166 over 41m)  persistentvolume-controller                                                                            Waiting for a volume to be created either by the external provisioner 'filestore.csi.storage.gke.io' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
  Normal   Provisioning          57s (x46 over 41m)   filestore.csi.storage.gke.io_rbednar-mycluster-01-sspvn-master-2_90c2218e-5688-4a51-9cd4-2ddc4f27c7fd  External provisioner is provisioning volume for claim "default/pvc-1-tagged"

ID assigned to PV: pvc-b997cc10-b8ea-40e1-b999-82c15d5fe8fe

Driver log:

E0820 13:09:38.705627       1 file.go:330] Failed to get instance projects/XXXXX-XXXXX-XXXXX/locations/us-central1-c/instances/pvc-b997cc10-b8ea-40e1-b999-82c15d5fe8fe
I0820 13:11:44.383540       1 cloud.go:121] GOOGLE_APPLICATION_CREDENTIALS env var set /etc/cloud-sa/service_account.json
I0820 13:11:44.383562       1 cloud.go:125] Using DefaultTokenSource &google.errWrappingTokenSource{src:(*oauth2.reuseTokenSource)(0xc0005aed50)}
E0820 13:11:44.489152       1 utils.go:59] GRPC error: rpc error: code = Unavailable desc = [parent1/tagKey1/tagValue1 parent2/tagKey2/tagValue2] tag(s) provided in pvc-b997cc10-b8ea-40e1-b999-82c15d5fe8fe create request does not exist
W0820 13:11:45.490542       1 controller.go:513] required bytes 0.0009765625TiB is less than minimum instance size capacity 1TiB for tier standard, but no upper bound was specified. Rounding up capacity request to 1TiB for tier standard.

Filestore instance is created in GCP regardless:

$ gcloud filestore instances describe pvc-b997cc10-b8ea-40e1-b999-82c15d5fe8fe --project XXXXX-XXXXX-XXXXX --zone us-central1-c --format json
{
  "createTime": "2024-08-20T13:09:39.277609914Z",
  "fileShares": [
    {
      "capacityGb": "1024",
      "name": "vol1"
    }
  ],
  "labels": {
.
.
.
  },
  "name": "projects/XXXXX-XXXXX-XXXXX/locations/us-central1-c/instances/pvc-b997cc10-b8ea-40e1-b999-82c15d5fe8fe",
  "networks": [
.
.
.
}

Deleting pending PVC succeeds:

$ oc delete pvc/pvc-1-tagged

Driver fails to delete due to non-existing tags:

E0820 14:00:43.365597       1 utils.go:59] GRPC error: rpc error: code = Unavailable desc = [parent1/tagKey1/tagValue1 parent2/tagKey2/tagValue2] tag(s) provided in pvc-b997cc10-b8ea-40e1-b999-82c15d5fe8fe create request does not exist
W0820 14:00:54.801423       1 controller.go:513] required bytes 0.0009765625TiB is less than minimum instance size capacity 1TiB for tier standard, but no upper bound was specified. Rounding up capacity request to 1TiB for tier standard.
I0820 14:00:54.831803       1 cloud.go:121] GOOGLE_APPLICATION_CREDENTIALS env var set /etc/cloud-sa/service_account.json

Filestore instance still exists and will hang until manually removed:

$ gcloud filestore instances describe pvc-b997cc10-b8ea-40e1-b999-82c15d5fe8fe --project XXXXX-XXXXX-XXXXX --zone us-central1-c --format json
{
  "createTime": "2024-08-20T13:09:39.277609914Z",
  .
  .
  .
RomanBednar commented 3 months ago

cc @leiyiz @tyuchn

RomanBednar commented 4 weeks ago

/assign @leiyiz @tyuchn @saikat-royc