kubernetes / cloud-provider-openstack

Apache License 2.0
616 stars 603 forks source link

[cinder-csi-plugin] Unable to attach or mount volumes: timed out waiting for the condition #1628

Closed sylvainOL closed 3 years ago

sylvainOL commented 3 years ago

/kind bug

What happened: Provisioner installed in our Kubernetes v1.20.7 cluster. The provisioner is able to create and attach PVCs to the required node but Pods using the PVCs enter a waiting state with the error Unable to attach or mount volumes: unmounted volumes=[data-volume], unattached volumes=[default-token-4w5lf data-volume]: timed out waiting for the condition

How can I determine what Condition the Pod's waiting for, and what might be happening? I guess I've something wrong with my setup but can't figure out what.

What you expected to happen:

Provisioner creates volumes, attached them to the compute instance and Pod successfully mounts the volume.

How to reproduce it:

Using a simple PVC and Pod: -

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: standard
  resources:
    requests:
      storage: '1Gi'
---
kind: Pod
apiVersion: v1
metadata:
  name: pysimple
spec:
  containers:
  - name: pysimple
    image: alanbchristie/pysimple:2019.8
    volumeMounts:
    - name: data-volume
      mountPath: /data
  volumes:
  - name: data-volume
    persistentVolumeClaim:
      claimName: test

Anything else we need to know?:

We're using kubespray release-2.16 to deploy this cluster. This is an automated deployment which was OK 3 weeks ago. I've reinstalled everything a second time and same issue appears

Environment:

Logs

looking at #1206, I tried to retrieve / debug by myself but I may not gather the right logs :(

sylvainOL commented 3 years ago

I've also taken a look at changes in kubespray and I don't see anything that might be an issue these previous weeks: https://github.com/kubernetes-sigs/kubespray/commits/release-2.16

Another change is that we've update debian base image to a more recent one (issue with stable change to oldstable) but I'm not sure it changes something :(

sylvainOL commented 3 years ago

oh and on another deployment, (k8s v1.19.9 deployed with kubespray release-2.15), but using the in tree provisioner (AFAIK, I don't see csi pods), it has the same issue and there I can see:

Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.208002   23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.208127   23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-c5mn9" (UniqueName: "kubernetes.io/secret/eae15b76-2dc6-4b45-a8f8-37afef64d215-default-token-c5mn9") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: E0818 10:26:16.208381   23979 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424 podName: nodeName:}" failed. No retries permitted until 2021-08-18 10:26:16.708289302 +0000 UTC m=+36961.681443996 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b\" (UniqueName: \"kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424\") pod \"pysimple\" (UID: \"eae15b76-2dc6-4b45-a8f8-37afef64d215\") "
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.309176   23979 reconciler.go:269] operationExecutor.MountVolume started for volume "default-token-c5mn9" (UniqueName: "kubernetes.io/secret/eae15b76-2dc6-4b45-a8f8-37afef64d215-default-token-c5mn9") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/pods/eae15b76-2dc6-4b45-a8f8-37afef64d215/volumes/kubernetes.io~secret/default-token-c5mn9.
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.326420   23979 operation_generator.go:663] MountVolume.SetUp succeeded for volume "default-token-c5mn9" (UniqueName: "kubernetes.io/secret/eae15b76-2dc6-4b45-a8f8-37afef64d215-default-token-c5mn9") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.712822   23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: E0818 10:26:16.713403   23979 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424 podName: nodeName:}" failed. No retries permitted until 2021-08-18 10:26:17.713179577 +0000 UTC m=+36962.686334383 (durationBeforeRetry 1s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b\" (UniqueName: \"kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424\") pod \"pysimple\" (UID: \"eae15b76-2dc6-4b45-a8f8-37afef64d215\") "
Aug 18 10:26:17 compute09-onap-master kubelet[23979]: I0818 10:26:17.720696   23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:17 compute09-onap-master kubelet[23979]: E0818 10:26:17.720928   23979 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424 podName: nodeName:}" failed. No retries permitted until 2021-08-18 10:26:19.720850651 +0000 UTC m=+36964.694005342 (durationBeforeRetry 2s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b\" (UniqueName: \"kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424\") pod \"pysimple\" (UID: \"eae15b76-2dc6-4b45-a8f8-37afef64d215\") "
Aug 18 10:26:17 compute09-onap-master kubelet[23979]: I0818 10:26:17.867484   23979 kubelet_getters.go:178] "Pod status updated" pod="kube-system/nginx-proxy-compute09-onap-master" status=Running
Aug 18 10:26:19 compute09-onap-master kubelet[23979]: I0818 10:26:19.694219   23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:26:19 compute09-onap-master kubelet[23979]: W0818 10:26:19.694320   23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:26:19 compute09-onap-master kubelet[23979]: I0818 10:26:19.733278   23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:19 compute09-onap-master kubelet[23979]: E0818 10:26:19.733545   23979 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424 podName: nodeName:}" failed. No retries permitted until 2021-08-18 10:26:23.733452437 +0000 UTC m=+36968.706607187 (durationBeforeRetry 4s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b\" (UniqueName: \"kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424\") pod \"pysimple\" (UID: \"eae15b76-2dc6-4b45-a8f8-37afef64d215\") "
Aug 18 10:26:23 compute09-onap-master kubelet[23979]: I0818 10:26:23.760904   23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:23 compute09-onap-master kubelet[23979]: I0818 10:26:23.768255   23979 operation_generator.go:1348] Controller attach succeeded for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215") device path: "/dev/sdf"
Aug 18 10:26:23 compute09-onap-master kubelet[23979]: I0818 10:26:23.861547   23979 reconciler.go:269] operationExecutor.MountVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:23 compute09-onap-master kubelet[23979]: I0818 10:26:23.861768   23979 operation_generator.go:558] MountVolume.WaitForAttach entering for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215") DevicePath "/dev/sdf"
Aug 18 10:26:25 compute09-onap-master kubelet[23979]: W0818 10:26:25.570278   23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:25 compute09-onap-master kubelet[23979]: E0818 10:26:25.570353   23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:26:28 compute09-onap-master kubelet[23979]: W0818 10:26:28.099386   23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:28 compute09-onap-master kubelet[23979]: E0818 10:26:28.099425   23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:26:29 compute09-onap-master kubelet[23979]: I0818 10:26:29.787654   23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:26:29 compute09-onap-master kubelet[23979]: W0818 10:26:29.789542   23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:26:32 compute09-onap-master kubelet[23979]: W0818 10:26:32.621000   23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:32 compute09-onap-master kubelet[23979]: E0818 10:26:32.621069   23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:26:39 compute09-onap-master kubelet[23979]: I0818 10:26:39.886535   23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:26:39 compute09-onap-master kubelet[23979]: W0818 10:26:39.886590   23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:26:41 compute09-onap-master kubelet[23979]: W0818 10:26:41.338150   23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:41 compute09-onap-master kubelet[23979]: E0818 10:26:41.338234   23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:26:49 compute09-onap-master kubelet[23979]: I0818 10:26:49.988059   23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:26:49 compute09-onap-master kubelet[23979]: W0818 10:26:49.988098   23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:26:58 compute09-onap-master kubelet[23979]: W0818 10:26:58.082561   23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:58 compute09-onap-master kubelet[23979]: E0818 10:26:58.082631   23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:27:00 compute09-onap-master kubelet[23979]: I0818 10:27:00.068286   23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:27:00 compute09-onap-master kubelet[23979]: W0818 10:27:00.068342   23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:27:10 compute09-onap-master kubelet[23979]: I0818 10:27:10.141330   23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:27:10 compute09-onap-master kubelet[23979]: W0818 10:27:10.141382   23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:27:17 compute09-onap-master kubelet[23979]: I0818 10:27:17.868071   23979 kubelet_getters.go:178] "Pod status updated" pod="kube-system/nginx-proxy-compute09-onap-master" status=Running
Aug 18 10:27:20 compute09-onap-master kubelet[23979]: I0818 10:27:20.248321   23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:27:20 compute09-onap-master kubelet[23979]: W0818 10:27:20.248389   23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:27:25 compute09-onap-master kubelet[23979]: W0818 10:27:25.408096   23979 openstack_volumes.go:576] Failed to find device for the volumeID: "2dcde027-dc08-477a-a269-0764538c163b"
Aug 18 10:27:25 compute09-onap-master kubelet[23979]: E0818 10:27:25.408164   23979 attacher.go:249] Error: could not find attached Cinder disk "2dcde027-dc08-477a-a269-0764538c163b" (path: ""): <nil>
Aug 18 10:27:30 compute09-onap-master kubelet[23979]: I0818 10:27:30.341445   23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:27:30 compute09-onap-master kubelet[23979]: W0818 10:27:30.342138   23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:27:30 compute09-onap-master kubelet[23979]: W0818 10:27:30.590876   23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:27:30 compute09-onap-master kubelet[23979]: E0818 10:27:30.590907   23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:27:34 compute09-onap-master kubelet[23979]: E0818 10:27:34.579596   23979 kubelet.go:1625] Unable to attach or mount volumes for pod "pysimple_default(0ac1ce61-3853-428d-8247-284e992f6ab4)": unmounted volumes=[data-volume default-token-c5mn9], unattached volumes=[data-volume default-token-c5mn9]: timed out waiting for the condition; skipping pod
Aug 18 10:27:34 compute09-onap-master kubelet[23979]: E0818 10:27:34.579679   23979 pod_workers.go:191] Error syncing pod 0ac1ce61-3853-428d-8247-284e992f6ab4 ("pysimple_default(0ac1ce61-3853-428d-8247-284e992f6ab4)"), skipping: unmounted volumes=[data-volume default-token-c5mn9], unattached volumes=[data-volume default-token-c5mn9]: timed out waiting for the condition
sylvainOL commented 3 years ago

continuing to find the issue, I've taken a look at an "old" deployment (using the old debian version but also being deployed via victoria but running now on wallaby) same version as previous one (k8s v1.19.9 deployed with kubespray release-2.15) and here it works!

the difference is on the naming of disk in /dev/disk/by-id/:

but when I look at https://github.com/kubernetes/cloud-provider-openstack/blob/release-1.20/pkg/util/mount/mount.go#L158, it should be good no?

sylvainOL commented 3 years ago

Ok, so first sorry to speak out loud here but I may have found something on the first issue: I'm deploying openstack csi but using kubelet provided openstack cloud adapter...

so I'm able to see the issue in kubelet from node:

Aug 18 12:47:24 compute06-onap-master-service-mesh kubelet[26532]: W0818 12:47:24.047291   26532 openstack_volumes.go:576] Failed to find device for the volumeID: "485f7181-8fa5-458d-8f24-73bde6e698ff"
Aug 18 12:47:24 compute06-onap-master-service-mesh kubelet[26532]: E0818 12:47:24.047352   26532 attacher.go:249] Error: could not find attached Cinder disk "485f7181-8fa5-458d-8f24-73bde6e698ff" (path: ""): <nil>

So I guess I'm not using cinder csi but the "old" way in kubelet.

And with the new version in libvirt, it's broken now.

So I'll close this one, sorry for the noise!

lingxiankong commented 3 years ago

All good @sylvainOL, please let us know if you have any further questions.