vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.57k stars 1.39k forks source link

"Timeout happened for preparing dataupload" during "CSI Snapshot Data Movement" to MinIO S3 #8113

Open Zalfsten opened 1 month ago

Zalfsten commented 1 month ago

What steps did you take and what happened:

I'm trying to create a backup from a sample nginx deployment (including a PVC) in the namespace default. These are my steps:

$ velero backup create test5 --include-namespaces=default --snapshot-volumes
Backup request "test5" submitted successfully.
Run `velero backup describe test5` or `velero backup logs test5` for more details.

$ velero backup describe test5
[...]
Phase:  WaitingForPluginOperations
[...]

then, after half an hour:

$ velero backup describe test5
[...]
Phase:  PartiallyFailed (run `velero backup logs test5` for more information)
[...]

In the node agent log, I observe the following error message:

time="2024-08-13T13:40:33Z" level=info msg="Timeout happened for preparing dataupload" Dataupload=test5-8klq9 logSource="pkg/controller/data_upload_controller.go:741"

What did you expect to happen:

I expected to see a big file in my S3 storage, containing the backup of the persistent volume (though I do not know what exactly to look out for).

The following information will help us better understand what's going on:

bundle-2024-08-13-15-53-39.tar.gz

Anything else you would like to add:

I'm unsure if I need to add a VolumeSnapshotLocation (using the provider csi). But if I do, I run into another error about velero not finding a matching VolumeSnapshotter.

I've installed the velero server using the helm chart version 7.1.2. This is my value file:

velero:
  image:
    repository: velero/velero
    tag: v1.13.2
    pullPolicy: IfNotPresent

  initContainers:
    - name: velero-plugin-for-csi
      image: velero/velero-plugin-for-csi:v0.7.1
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - mountPath: /target
          name: plugins
    - name: velero-plugin-for-aws
      image: velero/velero-plugin-for-aws:v1.10.0
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - mountPath: /target
          name: plugins

  cleanUpCRDs: false
  deployNodeAgent: true

  configuration:
    features: EnableCSI
    defaultSnapshotMoveData: true
    uploaderType: kopia
    backupStorageLocation:
      - provider: aws
        bucket: velero
        config:
          region: minio
          s3ForcePathStyle: "true"
          s3Url: <S3 URL>
  credentials:
    useSecret: true
    existingSecret: velero-secret

I suppose the VolumeSnapshotClass is also of interest:

kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
metadata:
  name: truenas-snapshot
  annotations:
    snapshot.storage.kubernetes.io/is-default-class: "true"
  labels:
    velero.io/csi-volumesnapshot-class: "true"
driver: csi.hpe.com
parameters:
  description: "Snapshot created on TrueNAS"
  csi.storage.k8s.io/snapshotter-secret-name: truenas-secret
  csi.storage.k8s.io/snapshotter-secret-namespace: kube-system
deletionPolicy: Delete

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

Lyndon-Li commented 1 month ago

From the below log, the timeout happened during creating volume from snapshot which matches to the snapshot clone stage:

time="2024-08-13T13:10:17Z" level=info msg="Backup VSC is created from snapcontent-3082c491-ed58-4381-98fb-baeb8745d45b" logSource="pkg/exposer/csi_snapshot.go:170" owner=test5-8klq9 vsc name=test5-8klq9
time="2024-08-13T13:10:17Z" level=warning msg="The snapshot doesn't contain a valid restore size, use source volume's size {{5368709120 0} {<nil>} 5Gi BinarySI}" logSource="pkg/exposer/csi_snapshot.go:177" owner=test5-8klq9 vs name=velero-pvc-test-iscsi-fzn95
time="2024-08-13T13:10:17Z" level=info msg="Backup PVC is created" logSource="pkg/exposer/csi_snapshot.go:185" owner=test5-8klq9 pvc name=test5-8klq9
time="2024-08-13T13:10:17Z" level=info msg="Backup pod is created" logSource="pkg/exposer/csi_snapshot.go:197" owner=test5-8klq9 pod name=test5-8klq9
time="2024-08-13T13:10:17Z" level=info msg="Snapshot is exposed" controller=dataupload dataupload=velero/test5-8klq9
....
time="2024-08-13T13:40:33Z" level=info msg="Timeout happened for preparing dataupload" Dataupload=test5-8klq9 logSource="pkg/controller/data_upload_controller.go:741"
time="2024-08-13T13:40:33Z" level=info msg="Dataupload has been cleaned up" Dataupload=test5-8klq9 logSource="pkg/controller/data_upload_controller.go:770"

It is possible that the CSI driver doesn't support snapshot clone. Please check your CSI driver spec to see if it is supported

Zalfsten commented 4 weeks ago

Thanks for the answer, @Lyndon-Li

It's hard to tell if my CSI driver supports snapshot cloning. On the CSI level there is the capability "CLONE_VOLUME" and according to this list my driver supports clones. But are "snapshot cloning", "CLONE_VOLUME" and "clones" the same? Also the only safe method to be sure would be to query the controller itself. I do not know how to do that.

But I will try out another CSI driver I'm using: csi-driver-nfs. Maybe this behaves differntly...

I've also had a look at the CSI driver logs. For me it looks good so far -- although the volume expansion part worries me a bit:

I0814 16:42:52.720351       1 controller.go:1366] provision "velero/test1-rb6tl" class "truenas-delete": started
I0814 16:42:52.722153       1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"velero", Name:"test1-rb6tl", UID:"27729f25-3337-4b97-b7bc-c99f413b0fc6", APIVersion:"v1", ResourceVersion:"17082142", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "velero/test1-rb6tl"
I0814 16:42:52.731537       1 controller.go:1150] VolumeSnapshot &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:test1-rb6tl GenerateName: Namespace:velero SelfLink: UID:9f62e55c-a1ae-439a-93ce-cde964158383 ResourceVersion:17082143 Generation:1 CreationTimestamp:2024-08-14 16:42:52 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection] ManagedFields:[{Manager:node-agent-server Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2024-08-14 16:42:52 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:spec":{".":{},"f:source":{".":{},"f:volumeSnapshotContentName":{}},"f:volumeSnapshotClassName":{}}} Subresource:} {Manager:snapshot-controller Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2024-08-14 16:42:52 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:finalizers":{".":{},"v:\"snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection\"":{}}}} Subresource:} {Manager:snapshot-controller Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2024-08-14 16:42:52 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:status":{".":{},"f:boundVolumeSnapshotContentName":{},"f:readyToUse":{}}} Subresource:status}]} Spec:{Source:{PersistentVolumeClaimName:<nil> VolumeSnapshotContentName:0xc000638590} VolumeSnapshotClassName:0xc0006385a0} Status:0xc0007b5620}
W0814 16:42:52.734771       1 controller.go:934] Retrying syncing claim "27729f25-3337-4b97-b7bc-c99f413b0fc6", failure 0
E0814 16:42:52.734814       1 controller.go:957] error syncing claim "27729f25-3337-4b97-b7bc-c99f413b0fc6": failed to provision volume with StorageClass "truenas-delete": error getting handle for DataSource Type VolumeSnapshot by Name test1-rb6tl: snapshot test1-rb6tl is not Ready
I0814 16:42:52.734891       1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"velero", Name:"test1-rb6tl", UID:"27729f25-3337-4b97-b7bc-c99f413b0fc6", APIVersion:"v1", ResourceVersion:"17082142", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "truenas-delete": error getting handle for DataSource Type VolumeSnapshot by Name test1-rb6tl: snapshot test1-rb6tl is not Ready
I0814 16:42:53.735874       1 controller.go:1366] provision "velero/test1-rb6tl" class "truenas-delete": started
I0814 16:42:53.736046       1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"velero", Name:"test1-rb6tl", UID:"27729f25-3337-4b97-b7bc-c99f413b0fc6", APIVersion:"v1", ResourceVersion:"17082142", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "velero/test1-rb6tl"
I0814 16:42:53.739340       1 controller.go:1150] VolumeSnapshot &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:test1-rb6tl GenerateName: Namespace:velero SelfLink: UID:9f62e55c-a1ae-439a-93ce-cde964158383 ResourceVersion:17082153 Generation:1 CreationTimestamp:2024-08-14 16:42:52 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection snapshot.storage.kubernetes.io/volumesnapshot-bound-protection] ManagedFields:[{Manager:node-agent-server Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2024-08-14 16:42:52 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:spec":{".":{},"f:source":{".":{},"f:volumeSnapshotContentName":{}},"f:volumeSnapshotClassName":{}}} Subresource:} {Manager:snapshot-controller Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2024-08-14 16:42:52 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:finalizers":{".":{},"v:\"snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection\"":{},"v:\"snapshot.storage.kubernetes.io/volumesnapshot-bound-protection\"":{}}}} Subresource:} {Manager:snapshot-controller Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2024-08-14 16:42:52 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:status":{".":{},"f:boundVolumeSnapshotContentName":{},"f:creationTime":{},"f:readyToUse":{},"f:restoreSize":{}}} Subresource:status}]} Spec:{Source:{PersistentVolumeClaimName:<nil> VolumeSnapshotContentName:0xc0007ae730} VolumeSnapshotClassName:0xc0007ae740} Status:0xc0000e3dd0}
I0814 16:42:53.743356       1 controller.go:1176] VolumeSnapshotContent &{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:test1-rb6tl GenerateName: Namespace: SelfLink: UID:7b49d6f9-579f-4352-83d7-c9befd117633 ResourceVersion:17082152 Generation:1 CreationTimestamp:2024-08-14 16:42:52 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[snapshot.storage.kubernetes.io/deletion-secret-name:truenas-secret snapshot.storage.kubernetes.io/deletion-secret-namespace:kube-system] OwnerReferences:[] Finalizers:[snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection] ManagedFields:[{Manager:csi-snapshotter Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2024-08-14 16:42:52 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:status":{".":{},"f:creationTime":{},"f:readyToUse":{},"f:restoreSize":{},"f:snapshotHandle":{}}} Subresource:status} {Manager:node-agent-server Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2024-08-14 16:42:52 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:annotations":{".":{},"f:snapshot.storage.kubernetes.io/deletion-secret-name":{},"f:snapshot.storage.kubernetes.io/deletion-secret-namespace":{}}},"f:spec":{".":{},"f:deletionPolicy":{},"f:driver":{},"f:source":{".":{},"f:snapshotHandle":{}},"f:volumeSnapshotClassName":{},"f:volumeSnapshotRef":{".":{},"f:name":{},"f:namespace":{},"f:resourceVersion":{},"f:uid":{}}}} Subresource:} {Manager:snapshot-controller Operation:Update APIVersion:snapshot.storage.k8s.io/v1 Time:2024-08-14 16:42:52 +0000 UTC FieldsType:FieldsV1 FieldsV1:{"f:metadata":{"f:finalizers":{".":{},"v:\"snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection\"":{}}}} Subresource:}]} Spec:{VolumeSnapshotRef:{Kind: Namespace:velero Name:test1-rb6tl UID:9f62e55c-a1ae-439a-93ce-cde964158383 APIVersion: ResourceVersion:17082136 FieldPath:} DeletionPolicy:Delete Driver:csi.hpe.com VolumeSnapshotClassName:0xc0007aedc0 Source:{VolumeHandle:<nil> SnapshotHandle:0xc0007aedb0} SourceVolumeMode:<nil>} Status:0xc0005f7290}
I0814 16:42:53.743434       1 controller.go:1187] VolumeContentSource_Snapshot {Snapshot:snapshot_id:"Kubernetes_pvc-76ffe0d8-c7d0-4e6e-aea3-8adeece42916@snapshot-a81f7e24-0f58-4611-98bb-20168d53acb2" }
I0814 16:42:53.743865       1 controller.go:1195] Requested volume size is 5368709120 and snapshot size is 0 for the source snapshot test1-rb6tl
W0814 16:42:53.743899       1 controller.go:1202] requested volume size 5368709120 is greater than the size 0 for the source snapshot test1-rb6tl. Volume plugin needs to handle volume expansion.
I0814 16:42:53.747651       1 connection.go:244] GRPC call: /csi.v1.Controller/CreateVolume
I0814 16:42:53.747670       1 connection.go:245] GRPC request: {"capacity_range":{"required_bytes":5368709120},"name":"pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6","parameters":{"allowOverrides":"sparse,compression,deduplication,volblocksize,sync,description","csi.storage.k8s.io/pv/name":"pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6","csi.storage.k8s.io/pvc/name":"test1-rb6tl","csi.storage.k8s.io/pvc/namespace":"velero","root":"Kubernetes"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":1}}],"volume_content_source":{"Type":{"Snapshot":{"snapshot_id":"Kubernetes_pvc-76ffe0d8-c7d0-4e6e-aea3-8adeece42916@snapshot-a81f7e24-0f58-4611-98bb-20168d53acb2"}}}}
I0814 16:42:56.382780       1 connection.go:251] GRPC response: {"volume":{"capacity_bytes":5368709120,"content_source":{"Type":{"Snapshot":{"snapshot_id":"Kubernetes_pvc-76ffe0d8-c7d0-4e6e-aea3-8adeece42916@snapshot-a81f7e24-0f58-4611-98bb-20168d53acb2"}}},"volume_context":{"allowOverrides":"sparse,compression,deduplication,volblocksize,sync,description","compression":"LZ4","csi.storage.k8s.io/pv/name":"pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6","csi.storage.k8s.io/pvc/name":"test1-rb6tl","csi.storage.k8s.io/pvc/namespace":"velero","deduplication":"OFF","fsType":"xfs","hostEncryption":"false","root":"Kubernetes","sync":"STANDARD","targetScope":"volume","volblocksize":"8K","volumeAccessMode":"mount"},"volume_id":"Kubernetes_pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6"}}
I0814 16:42:56.382800       1 connection.go:252] GRPC error: <nil>
I0814 16:42:56.382813       1 controller.go:858] create volume rep: {CapacityBytes:5368709120 VolumeId:Kubernetes_pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6 VolumeContext:map[allowOverrides:sparse,compression,deduplication,volblocksize,sync,description compression:LZ4 csi.storage.k8s.io/pv/name:pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6 csi.storage.k8s.io/pvc/name:test1-rb6tl csi.storage.k8s.io/pvc/namespace:velero deduplication:OFF fsType:xfs hostEncryption:false root:Kubernetes sync:STANDARD targetScope:volume volblocksize:8K volumeAccessMode:mount] ContentSource:snapshot:<snapshot_id:"Kubernetes_pvc-76ffe0d8-c7d0-4e6e-aea3-8adeece42916@snapshot-a81f7e24-0f58-4611-98bb-20168d53acb2" >  AccessibleTopology:[] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0814 16:42:56.382891       1 controller.go:930] createVolumeOperation: set annotation [volume.kubernetes.io/provisioner-deletion-secret-namespace/volume.kubernetes.io/provisioner-deletion-secret-name] on pv [pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6].
I0814 16:42:56.382921       1 controller.go:960] successfully created PV pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6 for PVC test1-rb6tl and csi volume name Kubernetes_pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6
I0814 16:42:56.382939       1 controller.go:976] successfully created PV {GCEPersistentDisk:nil AWSElasticBlockStore:nil HostPath:nil Glusterfs:nil NFS:nil RBD:nil ISCSI:nil Cinder:nil CephFS:nil FC:nil Flocker:nil FlexVolume:nil AzureFile:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil PortworxVolume:nil ScaleIO:nil Local:nil StorageOS:nil CSI:&CSIPersistentVolumeSource{Driver:csi.hpe.com,VolumeHandle:Kubernetes_pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6,ReadOnly:false,FSType:xfs,VolumeAttributes:map[string]string{allowOverrides: sparse,compression,deduplication,volblocksize,sync,description,compression: LZ4,csi.storage.k8s.io/pv/name: pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6,csi.storage.k8s.io/pvc/name: test1-rb6tl,csi.storage.k8s.io/pvc/namespace: velero,deduplication: OFF,fsType: xfs,hostEncryption: false,root: Kubernetes,storage.kubernetes.io/csiProvisionerIdentity: 1723652614992-6440-csi.hpe.com,sync: STANDARD,targetScope: volume,volblocksize: 8K,volumeAccessMode: mount,},ControllerPublishSecretRef:&SecretReference{Name:truenas-secret,Namespace:kube-system,},NodeStageSecretRef:&SecretReference{Name:truenas-secret,Namespace:kube-system,},NodePublishSecretRef:&SecretReference{Name:truenas-secret,Namespace:kube-system,},ControllerExpandSecretRef:&SecretReference{Name:truenas-secret,Namespace:kube-system,},NodeExpandSecretRef:nil,}}
I0814 16:42:56.383025       1 controller.go:1449] provision "velero/test1-rb6tl" class "truenas-delete": volume "pvc-27729f25-3337-4b97-b7bc-c99f413b0fc6" provisioned
I0814 16:42:56.383038       1 controller.go:1462] provision "velero/test1-rb6tl" class "truenas-delete": succeeded
reasonerjt commented 3 weeks ago

@Zalfsten

During data movement velero will create a temporary pod (Backup pod in the log) and mount a PVC created via the snapshot. Based on the log messages

time="2024-08-13T13:10:17Z" level=info msg="Backup PVC is created" logSource="pkg/exposer/csi_snapshot.go:185" owner=test5-8klq9 pvc name=test5-8klq9
time="2024-08-13T13:10:17Z" level=info msg="Backup pod is created" logSource="pkg/exposer/csi_snapshot.go:197" owner=test5-8klq9 pod name=test5-8klq9
time="2024-08-13T13:10:17Z" level=info msg="Snapshot is exposed" controller=dataupload dataupload=velero/test5-8klq9
time="2024-08-13T13:10:17Z" level=info msg="Reconcile test5-8klq9" controller=dataupload dataupload=velero/test5-8klq9 logSource="pkg/controller/data_upload_controller.go:112"
time="2024-08-13T13:10:33Z" level=info msg="Reconcile test5-8klq9" controller=dataupload dataupload=velero/test5-8klq9 logSource="pkg/controller/data_upload_controller.go:112"
time="2024-08-13T13:11:33Z" level=info msg="Reconcile test5-8klq9" controller=dataupload dataupload=velero/test5-8klq9 logSource="pkg/controller/data_upload_controller.go:112"
....
time="2024-08-13T13:40:33Z" level=info msg="Timeout happened for preparing dataupload" Dataupload=test5-8klq9 logSource="pkg/controller/data_upload_controller.go:741"

The pod did not transfer to Running state in 30 mins, and eventually was deleted. So in the log-bundle there's no clear message to help us understanding why the state of the pod was stuck in Pending.
Have you tested that you can manually take a snapshot and restore from it using the csi driver and snapshotter in your env? If this works, you may need to reproduce the issue, and grab the temporary pod and pvc, and check their status to see why they were stuck.

cc @Lyndon-Li I think we should dump more message in node-agent before the pod and pvc are cleaned up in this timeout situation to help us debug.

Lyndon-Li commented 3 weeks ago

@reasonerjt

I think we should dump more message in node-agent before the pod and pvc are cleaned up in this timeout situation to help us debug

Yes, I opened an issue for this, see #8125

Zalfsten commented 2 weeks ago

@reasonerjt

Sorry, I've had little time to work on this issue. I can manually restore from a snapshot with my CSI driver (i.e. I create a pvc, take a snapshot from the pvc and then create a new pvc with the snapshot as source -- please give me a hint if this is not the correct test). I will check the backup pod and pvc tomorrow. Any new findings I will post in the issue #8125.

Zalfsten commented 2 weeks ago

Oops... I really do not know what happened, but the backups now shows "completed".

The only thing I can tell is that I've recreated the StorageClasses -- without changing anything, as far as I can tell.

Zalfsten commented 2 weeks ago

Now I have the same issue again, this time using a different CSI driver.

The temporary pod shows this error message:

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  10m (x2 over 10m)   default-scheduler  0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling..
  Normal   Scheduled         10m                 default-scheduler  Successfully assigned velero/test3-j8bf6 to elise-01
  Warning  FailedMount       24s (x13 over 10m)  kubelet            MountVolume.MountDevice failed for volume "pvc-52178ff2-b3be-495a-a28e-a98016f6a337" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name nfs.csi.k8s.io not found in the list of registered CSI drivers

Contrast to the error message, I can find the CSI driver:

$ kubectl describe csidrivers.storage.k8s.io nfs.csi.k8s.io
Name:         nfs.csi.k8s.io
Namespace:
Labels:       argocd.argoproj.io/instance=storage
Annotations:  argocd.argoproj.io/tracking-id: storage:storage.k8s.io/CSIDriver:storage/nfs.csi.k8s.io
API Version:  storage.k8s.io/v1
Kind:         CSIDriver
Metadata:
  Creation Timestamp:  2024-08-20T17:20:14Z
  Resource Version:    20159506
  UID:                 1a8ffe86-c836-4084-8825-4cc0cf3550df
Spec:
  Attach Required:     false
  Fs Group Policy:     File
  Pod Info On Mount:   false
  Requires Republish:  false
  Se Linux Mount:      false
  Storage Capacity:    false
  Volume Lifecycle Modes:
    Persistent
Events:  <none>
Zalfsten commented 2 weeks ago

Now it works again. Seems to be non-deterministic...