Closed sylvainOL closed 3 years ago
I've also taken a look at changes in kubespray and I don't see anything that might be an issue these previous weeks: https://github.com/kubernetes-sigs/kubespray/commits/release-2.16
Another change is that we've update debian base image to a more recent one (issue with stable change to oldstable) but I'm not sure it changes something :(
oh and on another deployment, (k8s v1.19.9
deployed with kubespray release-2.15
), but using the in tree provisioner (AFAIK, I don't see csi pods), it has the same issue
and there I can see:
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.208002 23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.208127 23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-c5mn9" (UniqueName: "kubernetes.io/secret/eae15b76-2dc6-4b45-a8f8-37afef64d215-default-token-c5mn9") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: E0818 10:26:16.208381 23979 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424 podName: nodeName:}" failed. No retries permitted until 2021-08-18 10:26:16.708289302 +0000 UTC m=+36961.681443996 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b\" (UniqueName: \"kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424\") pod \"pysimple\" (UID: \"eae15b76-2dc6-4b45-a8f8-37afef64d215\") "
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.309176 23979 reconciler.go:269] operationExecutor.MountVolume started for volume "default-token-c5mn9" (UniqueName: "kubernetes.io/secret/eae15b76-2dc6-4b45-a8f8-37afef64d215-default-token-c5mn9") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/pods/eae15b76-2dc6-4b45-a8f8-37afef64d215/volumes/kubernetes.io~secret/default-token-c5mn9.
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.326420 23979 operation_generator.go:663] MountVolume.SetUp succeeded for volume "default-token-c5mn9" (UniqueName: "kubernetes.io/secret/eae15b76-2dc6-4b45-a8f8-37afef64d215-default-token-c5mn9") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: I0818 10:26:16.712822 23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:16 compute09-onap-master kubelet[23979]: E0818 10:26:16.713403 23979 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424 podName: nodeName:}" failed. No retries permitted until 2021-08-18 10:26:17.713179577 +0000 UTC m=+36962.686334383 (durationBeforeRetry 1s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b\" (UniqueName: \"kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424\") pod \"pysimple\" (UID: \"eae15b76-2dc6-4b45-a8f8-37afef64d215\") "
Aug 18 10:26:17 compute09-onap-master kubelet[23979]: I0818 10:26:17.720696 23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:17 compute09-onap-master kubelet[23979]: E0818 10:26:17.720928 23979 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424 podName: nodeName:}" failed. No retries permitted until 2021-08-18 10:26:19.720850651 +0000 UTC m=+36964.694005342 (durationBeforeRetry 2s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b\" (UniqueName: \"kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424\") pod \"pysimple\" (UID: \"eae15b76-2dc6-4b45-a8f8-37afef64d215\") "
Aug 18 10:26:17 compute09-onap-master kubelet[23979]: I0818 10:26:17.867484 23979 kubelet_getters.go:178] "Pod status updated" pod="kube-system/nginx-proxy-compute09-onap-master" status=Running
Aug 18 10:26:19 compute09-onap-master kubelet[23979]: I0818 10:26:19.694219 23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:26:19 compute09-onap-master kubelet[23979]: W0818 10:26:19.694320 23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:26:19 compute09-onap-master kubelet[23979]: I0818 10:26:19.733278 23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:19 compute09-onap-master kubelet[23979]: E0818 10:26:19.733545 23979 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424 podName: nodeName:}" failed. No retries permitted until 2021-08-18 10:26:23.733452437 +0000 UTC m=+36968.706607187 (durationBeforeRetry 4s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b\" (UniqueName: \"kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424\") pod \"pysimple\" (UID: \"eae15b76-2dc6-4b45-a8f8-37afef64d215\") "
Aug 18 10:26:23 compute09-onap-master kubelet[23979]: I0818 10:26:23.760904 23979 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:23 compute09-onap-master kubelet[23979]: I0818 10:26:23.768255 23979 operation_generator.go:1348] Controller attach succeeded for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215") device path: "/dev/sdf"
Aug 18 10:26:23 compute09-onap-master kubelet[23979]: I0818 10:26:23.861547 23979 reconciler.go:269] operationExecutor.MountVolume started for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215")
Aug 18 10:26:23 compute09-onap-master kubelet[23979]: I0818 10:26:23.861768 23979 operation_generator.go:558] MountVolume.WaitForAttach entering for volume "pvc-9a5f3728-71a6-476c-b314-c5597c4b3c7b" (UniqueName: "kubernetes.io/cinder/72ebe3ae-6c00-4938-a846-641d6db41424") pod "pysimple" (UID: "eae15b76-2dc6-4b45-a8f8-37afef64d215") DevicePath "/dev/sdf"
Aug 18 10:26:25 compute09-onap-master kubelet[23979]: W0818 10:26:25.570278 23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:25 compute09-onap-master kubelet[23979]: E0818 10:26:25.570353 23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:26:28 compute09-onap-master kubelet[23979]: W0818 10:26:28.099386 23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:28 compute09-onap-master kubelet[23979]: E0818 10:26:28.099425 23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:26:29 compute09-onap-master kubelet[23979]: I0818 10:26:29.787654 23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:26:29 compute09-onap-master kubelet[23979]: W0818 10:26:29.789542 23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:26:32 compute09-onap-master kubelet[23979]: W0818 10:26:32.621000 23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:32 compute09-onap-master kubelet[23979]: E0818 10:26:32.621069 23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:26:39 compute09-onap-master kubelet[23979]: I0818 10:26:39.886535 23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:26:39 compute09-onap-master kubelet[23979]: W0818 10:26:39.886590 23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:26:41 compute09-onap-master kubelet[23979]: W0818 10:26:41.338150 23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:41 compute09-onap-master kubelet[23979]: E0818 10:26:41.338234 23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:26:49 compute09-onap-master kubelet[23979]: I0818 10:26:49.988059 23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:26:49 compute09-onap-master kubelet[23979]: W0818 10:26:49.988098 23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:26:58 compute09-onap-master kubelet[23979]: W0818 10:26:58.082561 23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:26:58 compute09-onap-master kubelet[23979]: E0818 10:26:58.082631 23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:27:00 compute09-onap-master kubelet[23979]: I0818 10:27:00.068286 23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:27:00 compute09-onap-master kubelet[23979]: W0818 10:27:00.068342 23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:27:10 compute09-onap-master kubelet[23979]: I0818 10:27:10.141330 23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:27:10 compute09-onap-master kubelet[23979]: W0818 10:27:10.141382 23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:27:17 compute09-onap-master kubelet[23979]: I0818 10:27:17.868071 23979 kubelet_getters.go:178] "Pod status updated" pod="kube-system/nginx-proxy-compute09-onap-master" status=Running
Aug 18 10:27:20 compute09-onap-master kubelet[23979]: I0818 10:27:20.248321 23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:27:20 compute09-onap-master kubelet[23979]: W0818 10:27:20.248389 23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:27:25 compute09-onap-master kubelet[23979]: W0818 10:27:25.408096 23979 openstack_volumes.go:576] Failed to find device for the volumeID: "2dcde027-dc08-477a-a269-0764538c163b"
Aug 18 10:27:25 compute09-onap-master kubelet[23979]: E0818 10:27:25.408164 23979 attacher.go:249] Error: could not find attached Cinder disk "2dcde027-dc08-477a-a269-0764538c163b" (path: ""): <nil>
Aug 18 10:27:30 compute09-onap-master kubelet[23979]: I0818 10:27:30.341445 23979 setters.go:77] Using node IP: "10.253.0.172"
Aug 18 10:27:30 compute09-onap-master kubelet[23979]: W0818 10:27:30.342138 23979 setters.go:179] adding overridden hostname of compute09-onap-master to cloudprovider-reported addresses
Aug 18 10:27:30 compute09-onap-master kubelet[23979]: W0818 10:27:30.590876 23979 openstack_volumes.go:576] Failed to find device for the volumeID: "72ebe3ae-6c00-4938-a846-641d6db41424"
Aug 18 10:27:30 compute09-onap-master kubelet[23979]: E0818 10:27:30.590907 23979 attacher.go:249] Error: could not find attached Cinder disk "72ebe3ae-6c00-4938-a846-641d6db41424" (path: ""): <nil>
Aug 18 10:27:34 compute09-onap-master kubelet[23979]: E0818 10:27:34.579596 23979 kubelet.go:1625] Unable to attach or mount volumes for pod "pysimple_default(0ac1ce61-3853-428d-8247-284e992f6ab4)": unmounted volumes=[data-volume default-token-c5mn9], unattached volumes=[data-volume default-token-c5mn9]: timed out waiting for the condition; skipping pod
Aug 18 10:27:34 compute09-onap-master kubelet[23979]: E0818 10:27:34.579679 23979 pod_workers.go:191] Error syncing pod 0ac1ce61-3853-428d-8247-284e992f6ab4 ("pysimple_default(0ac1ce61-3853-428d-8247-284e992f6ab4)"), skipping: unmounted volumes=[data-volume default-token-c5mn9], unattached volumes=[data-volume default-token-c5mn9]: timed out waiting for the condition
continuing to find the issue, I've taken a look at an "old" deployment (using the old debian version but also being deployed via victoria but running now on wallaby)
same version as previous one (k8s v1.19.9
deployed with kubespray release-2.15
) and here it works!
the difference is on the naming of disk in /dev/disk/by-id/
:
old one:
ls -l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root 9 Aug 18 12:16 virtio-20841276-a7f7-4558-9 -> ../../vdc
lrwxrwxrwx 1 root root 10 Aug 18 12:16 virtio-20841276-a7f7-4558-9-part1 -> ../../vdc1
lrwxrwxrwx 1 root root 9 Aug 18 12:16 virtio-47f49d38-a2a2-4a29-8 -> ../../vdb
lrwxrwxrwx 1 root root 10 Aug 18 12:16 virtio-47f49d38-a2a2-4a29-8-part1 -> ../../vdb1
lrwxrwxrwx 1 root root 9 Aug 18 12:16 virtio-84411c8d-7df1-45af-9 -> ../../vde
lrwxrwxrwx 1 root root 9 Aug 18 12:16 virtio-c9a59f31-bb54-4821-a -> ../../vdd
lrwxrwxrwx 1 root root 10 Aug 18 12:16 virtio-c9a59f31-bb54-4821-a-part1 -> ../../vdd1
new one:
ls -l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root 9 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_4443ca08-531f-4719-b5d6-41c293c43d8b -> ../../sdc
lrwxrwxrwx 1 root root 10 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_4443ca08-531f-4719-b5d6-41c293c43d8b-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 9 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_5630acac-2a21-4afb-b720-006ef7ed5d68 -> ../../sdb
lrwxrwxrwx 1 root root 10 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_5630acac-2a21-4afb-b720-006ef7ed5d68-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 9 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_72ebe3ae-6c00-4938-a846-641d6db41424 -> ../../sdf
lrwxrwxrwx 1 root root 9 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_cda1af67-579d-40ef-8aab-173308c4004d -> ../../sdd
lrwxrwxrwx 1 root root 10 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_cda1af67-579d-40ef-8aab-173308c4004d-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 9 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0 -> ../../sda
lrwxrwxrwx 1 root root 10 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 11 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part14 -> ../../sda14
lrwxrwxrwx 1 root root 11 Aug 18 12:02 scsi-0QEMU_QEMU_HARDDISK_drive-scsi0-0-0-0-part15 -> ../../sda15
but when I look at https://github.com/kubernetes/cloud-provider-openstack/blob/release-1.20/pkg/util/mount/mount.go#L158, it should be good no?
Ok, so first sorry to speak out loud here but I may have found something on the first issue: I'm deploying openstack csi but using kubelet provided openstack cloud adapter...
so I'm able to see the issue in kubelet from node:
Aug 18 12:47:24 compute06-onap-master-service-mesh kubelet[26532]: W0818 12:47:24.047291 26532 openstack_volumes.go:576] Failed to find device for the volumeID: "485f7181-8fa5-458d-8f24-73bde6e698ff"
Aug 18 12:47:24 compute06-onap-master-service-mesh kubelet[26532]: E0818 12:47:24.047352 26532 attacher.go:249] Error: could not find attached Cinder disk "485f7181-8fa5-458d-8f24-73bde6e698ff" (path: ""): <nil>
So I guess I'm not using cinder csi but the "old" way in kubelet.
And with the new version in libvirt, it's broken now.
So I'll close this one, sorry for the noise!
All good @sylvainOL, please let us know if you have any further questions.
/kind bug
What happened: Provisioner installed in our Kubernetes
v1.20.7
cluster. The provisioner is able to create and attach PVCs to the required node but Pods using the PVCs enter a waiting state with the error Unable to attach or mount volumes: unmounted volumes=[data-volume], unattached volumes=[default-token-4w5lf data-volume]: timed out waiting for the conditionHow can I determine what Condition the Pod's waiting for, and what might be happening? I guess I've something wrong with my setup but can't figure out what.
What you expected to happen:
Provisioner creates volumes, attached them to the compute instance and Pod successfully mounts the volume.
How to reproduce it:
Using a simple PVC and Pod: -
Anything else we need to know?:
We're using kubespray
release-2.16
to deploy this cluster. This is an automated deployment which was OK 3 weeks ago. I've reinstalled everything a second time and same issue appearsEnvironment:
1.2.2@v1.20.0
Wallaby
Logs
looking at #1206, I tried to retrieve / debug by myself but I may not gather the right logs :(
csi-cinder-controllerplugin csi-provisioner:
csi-cinder-controllerplugin csi-attacher:
csi-cinder-controllerplugin cinder-csi-plugin:
csi-cinder-nodeplugin cinder-csi-plugin of used node:
kubelet of controller:
kubeler of used node: