kubernetes-sigs / vsphere-csi-driver

vSphere storage Container Storage Interface (CSI) plugin
https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/index.html
Apache License 2.0
293 stars 177 forks source link

volumeID not found in QueryVolume #2842

Closed Midaxess closed 5 months ago

Midaxess commented 5 months ago

/kind bug

What happened:

When a pod try to attached a pvc in RWX mode, I have this error event :

AttachVolume.Attach failed for volume "pv-test" : rpc error: code = Internal desc = volumeID file:1894a9d2-2055-4709-8834-9a5b596b5cb1 not found in QueryVolume

How to reproduce it (as minimally and precisely as possible):

3 masters / 5 workers : v1.26.14+rke2r1 CSI Helm Chart : rancher/rancher-vsphere-csi 103.1.0+up3.1.2-rancher1 PV / PVC deployed in static : example/vanilla-k8s-RWM-filesystem-volumes/example-static-fileshare-provisioning.yaml

CSI Config :

maxPvscsiTargetsPerVm:
  enabled: true
csiController:
  csiResizer:
    enabled: true
csiMigration:
  enabled: true
onlineVolumeExtend:
  enabled: true      
storageClass:
  allowVolumeExpansion: true

Anything else we need to know?:

I only have this issue when I have more than 1 worker

Mount PVC in RWO works well

To bypass this issue, I'm currently creating PV with NFS spec and I export the file share path from my vCenter to the PV configuration but it's not practical and I'm losing the usefulness of the csi plugin.

Environment:

Midaxess commented 5 months ago

It was an issue with systemd-resolved on my workers I've disabled it

0xdnL commented 2 months ago

@Midaxess could give some more info on what needed fixing with systemd-resolved ?

I have the exact same issue. Pods aren't starting because of

kubelet  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data sc-rules-volume kube-api-access-hprhs config runtime-config tmp]: timed out waiting for the condition

and checking the volumeattachments I find:

kubectl get volumeattachments.storage.k8s.io csi-65....07 -o yaml

..
status:
  ..
  detachError:
    message: 'rpc error: code = Internal desc = volumeID "10a31132-1ce5-4857-9355-af3fc461c502"
      not found in QueryVolume'
Midaxess commented 2 months ago

In my test environnement, my vcenter domain address is xxxx.local (not a good idea) systemd-resolved don't use external nameserver to resolve .local domain

So to force it, I edited /etc/systemd/resolved.conf and changed : DNSStubListener=no