harvester / harvester

Open source hyperconverged infrastructure (HCI) software
https://harvesterhci.io/
Apache License 2.0
3.86k stars 325 forks source link

[BUG] VMs do not start due to mounting errors of root disk #2156

Open pagong opened 2 years ago

pagong commented 2 years ago

Describe the bug I'm trying to install VMs from ISO images. I've tried both, openSUE-leap15.3 and Talos v1.0.0

To Reproduce

Expected behavior Guest VMs should startup properly.

Support bundle

Environment:

Additional context Output of "kubectl describe pod virt-launcher-leap-153-b-682gs" ... Events: Type Reason Age From Message


Warning FailedScheduling 3m11s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. Warning FailedScheduling 3m10s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. Normal Scheduled 2m50s default-scheduler Successfully assigned default/virt-launcher-leap-153-b-682gs to node21 Normal SuccessfulAttachVolume 2m40s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" Normal SuccessfulAttachVolume 2m28s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" Normal SuccessfulMountVolume 2m19s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24/dev" Normal SuccessfulMountVolume 2m19s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" volumeMapPath "/var/lib/kubelet/pods/fd9bac50-8c95-4871-9503-a3ff390f019a/volumeDevices/kubernetes.io~csi" Warning FailedMount 48s kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata libvirt-runtime disk-1 disk-0 cloudinitdisk-udata private hotplug-disks sockets public ephemeral-disks container-disks]: timed out waiting for the condition Warning FailedMapVolume 27s (x9 over 2m35s) kubelet MapVolume.MapPodDevice failed for volume "pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" : rpc error: code = Internal desc = Could not mount "/dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" at "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a": mount failed: exit status 32 Mounting command: mount Mounting arguments: -o bind /dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a: special device /dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a does not exist.

node21:~ # kubectl get pv -o wide NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE VOLUMEMODE pvc-6a47d4cd-c04a-4a67-906c-3016c54e7fae 10Mi RWO Delete Bound cattle-monitoring-system/rancher-monitoring-grafana longhorn 21d Filesystem pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24 1Gi RWX Delete Bound default/leap-153-b-disk-0-mzlyf longhorn-image-g2vtx 3m Block pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a 20Gi RWX Delete Bound default/leap-153-b-disk-1-3gt8w longhorn 3m2s Block pvc-d2a4d8d5-0f19-4dc2-bcd0-f68c2656c762 50Gi RWO Delete Bound cattle-monitoring-system/prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0 longhorn 21d Filesystem

node21:~ # kubectl get pvc -A -o wide NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE cattle-monitoring-system prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0 Bound pvc-d2a4d8d5-0f19-4dc2-bcd0-f68c2656c762 50Gi RWO longhorn 21d Filesystem cattle-monitoring-system rancher-monitoring-grafana Bound pvc-6a47d4cd-c04a-4a67-906c-3016c54e7fae 10Mi RWO longhorn 21d Filesystem default leap-153-b-disk-0-mzlyf Bound pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24 1Gi RWX longhorn-image-g2vtx 3m46s Block default leap-153-b-disk-1-3gt8w Bound pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a 20Gi RWX longhorn 3m46s Block

pagong commented 2 years ago

Similar to these issues: https://github.com/harvester/harvester/issues/2085 https://github.com/harvester/harvester/issues/2113

Maybe related to https://github.com/harvester/harvester/issues/2151 ??

pagong commented 2 years ago

node21:~ # kubectl describe pod virt-launcher-leap-153-b-682gs Name: virt-launcher-leap-153-b-682gs Namespace: default Priority: 0 Node: node21/192.168.20.121 Start Time: Sat, 16 Apr 2022 11:05:24 +0000 Labels: harvesterhci.io/vmName=leap-153-b kubevirt.io=virt-launcher kubevirt.io/created-by=9782b91b-a590-4ce6-b7c3-a4f98be9351d Annotations: harvesterhci.io/sshNames: [] kubernetes.io/psp: global-unrestricted-psp kubevirt.io/domain: leap-153-b post.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--unfreeze", "--name", "leap-153-b", "--namespace", "default"] post.hook.backup.velero.io/container: compute pre.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--freeze", "--name", "leap-153-b", "--namespace", "default"] pre.hook.backup.velero.io/container: compute traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0 Status: Pending IP:
IPs: Controlled By: VirtualMachineInstance/leap-153-b Containers: compute: Container ID:
Image: registry.suse.com/suse/sles/15.3/virt-launcher:0.45.0-8.4.3 Image ID:
Port: Host Port: Command: /usr/bin/virt-launcher --qemu-timeout 314s --name leap-153-b --uid 9782b91b-a590-4ce6-b7c3-a4f98be9351d --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/OVMF State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 2 devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 memory: 2336925697 Requests: cpu: 125m devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 ephemeral-storage: 50M memory: 1620748289 Environment: POD_NAME: virt-launcher-leap-153-b-682gs (v1:metadata.name) Mounts: /var/run/kubevirt from public (rw) /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw) /var/run/kubevirt-private from private (rw) /var/run/kubevirt-private/secret/cloudinitdisk/networkData from cloudinitdisk-ndata (ro,path="networkData") /var/run/kubevirt-private/secret/cloudinitdisk/networkdata from cloudinitdisk-ndata (ro,path="networkdata") /var/run/kubevirt-private/secret/cloudinitdisk/userData from cloudinitdisk-udata (ro,path="userData") /var/run/kubevirt-private/secret/cloudinitdisk/userdata from cloudinitdisk-udata (ro,path="userdata") /var/run/kubevirt/container-disks from container-disks (rw) /var/run/kubevirt/hotplug-disks from hotplug-disks (rw) /var/run/kubevirt/sockets from sockets (rw) /var/run/libvirt from libvirt-runtime (rw) Devices: /dev/disk-0 from disk-0 /dev/disk-1 from disk-1 Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: private: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: public: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: sockets: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: disk-1: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: leap-153-b-disk-1-3gt8w ReadOnly: false disk-0: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: leap-153-b-disk-0-mzlyf ReadOnly: false cloudinitdisk-udata: Type: Secret (a volume populated by a Secret) SecretName: leap-153-b-rsi0m Optional: false cloudinitdisk-ndata: Type: Secret (a volume populated by a Secret) SecretName: leap-153-b-rsi0m Optional: false virt-bin-share-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: libvirt-runtime: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: ephemeral-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: container-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: hotplug-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: QoS Class: Burstable Node-Selectors: kubevirt.io/schedulable=true Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Warning FailedScheduling 46m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. Normal Scheduled 45m default-scheduler Successfully assigned default/virt-launcher-leap-153-b-682gs to node21 Warning FailedScheduling 46m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. Normal SuccessfulAttachVolume 45m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" Normal SuccessfulAttachVolume 45m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" Normal SuccessfulMountVolume 45m kubelet MapVolume.MapPodDevice succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24/dev" Normal SuccessfulMountVolume 45m kubelet MapVolume.MapPodDevice succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" volumeMapPath "/var/lib/kubelet/pods/fd9bac50-8c95-4871-9503-a3ff390f019a/volumeDevices/kubernetes.io~csi" Warning FailedMount 43m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata libvirt-runtime disk-1 disk-0 cloudinitdisk-udata private hotplug-disks sockets public ephemeral-disks container-disks]: timed out waiting for the condition Warning FailedMount 41m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[container-disks cloudinitdisk-udata ephemeral-disks disk-1 disk-0 sockets public hotplug-disks cloudinitdisk-ndata private libvirt-runtime]: timed out waiting for the condition Warning FailedMount 39m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[disk-1 disk-0 hotplug-disks libvirt-runtime sockets cloudinitdisk-udata ephemeral-disks container-disks cloudinitdisk-ndata private public]: timed out waiting for the condition Warning FailedMount 37m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata container-disks disk-0 public hotplug-disks libvirt-runtime sockets cloudinitdisk-udata disk-1 ephemeral-disks private]: timed out waiting for the condition Warning FailedMount 34m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata disk-1 private ephemeral-disks container-disks hotplug-disks public libvirt-runtime sockets cloudinitdisk-udata disk-0]: timed out waiting for the condition Warning FailedMount 32m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata public container-disks disk-1 private ephemeral-disks cloudinitdisk-udata hotplug-disks libvirt-runtime sockets disk-0]: timed out waiting for the condition Warning FailedMount 30m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[public container-disks hotplug-disks cloudinitdisk-ndata ephemeral-disks disk-0 private sockets cloudinitdisk-udata libvirt-runtime disk-1]: timed out waiting for the condition Warning FailedMount 27m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[ephemeral-disks libvirt-runtime cloudinitdisk-udata disk-1 sockets public container-disks hotplug-disks cloudinitdisk-ndata disk-0 private]: timed out waiting for the condition Warning FailedMount 25m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[libvirt-runtime sockets cloudinitdisk-udata ephemeral-disks cloudinitdisk-ndata disk-0 public container-disks private hotplug-disks disk-1]: timed out waiting for the condition Warning FailedMapVolume 15m (x23 over 45m) kubelet MapVolume.MapPodDevice failed for volume "pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" : rpc error: code = Internal desc = Could not mount "/dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" at "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a": mount failed: exit status 32 Mounting command: mount Mounting arguments: -o bind /dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a: special device /dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a does not exist. Warning FailedMount 5m18s (x9 over 23m) kubelet (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[public sockets cloudinitdisk-ndata ephemeral-disks libvirt-runtime private container-disks hotplug-disks cloudinitdisk-udata disk-0 disk-1]: timed out waiting for the condition

pagong commented 2 years ago

node21:~ # kubectl describe VirtualMachineInstance/leap-153-b
Name: leap-153-b Namespace: default Labels: harvesterhci.io/vmName=leap-153-b Annotations: harvesterhci.io/sshNames: [] kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1alpha3 API Version: kubevirt.io/v1 Kind: VirtualMachineInstance Metadata: Creation Timestamp: 2022-04-16T17:01:40Z Finalizers: kubevirt.io/virtualMachineControllerFinalize foregroundDeleteVirtualMachine wrangler.cattle.io/VMIController.UnsetOwnerOfPVCs Generation: 4 Managed Fields: API Version: kubevirt.io/v1alpha3 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:harvesterhci.io/sshNames: f:kubevirt.io/latest-observed-api-version: f:kubevirt.io/storage-observed-api-version: f:finalizers: .: v:"kubevirt.io/virtualMachineControllerFinalize": f:labels: .: f:harvesterhci.io/vmName: f:ownerReferences: .: k:{"uid":"f124275d-2f50-4a58-b120-7a382469cb7a"}: .: f:apiVersion: f:blockOwnerDeletion: f:controller: f:kind: f:name: f:uid: f:spec: .: f:domain: .: f:cpu: .: f:cores: f:sockets: f:threads: f:devices: .: f:disks: f:interfaces: f:firmware: .: f:uuid: f:machine: .: f:type: f:memory: .: f:guest: f:resources: .: f:limits: .: f:cpu: f:memory: f:requests: .: f:cpu: f:memory: f:evictionStrategy: f:hostname: f:networks: f:volumes: f:status: .: f:activePods: .: f:7dee95b6-d4d0-411f-b3f6-7b730f8102ba: f:conditions: f:guestOSInfo: f:phase: f:phaseTransitionTimestamps: f:qosClass: f:virtualMachineRevisionName: Manager: Go-http-client Operation: Update Time: 2022-04-16T17:01:40Z API Version: kubevirt.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: v:"wrangler.cattle.io/VMIController.UnsetOwnerOfPVCs": Manager: harvester Operation: Update Time: 2022-04-16T17:01:40Z Owner References: API Version: kubevirt.io/v1 Block Owner Deletion: true Controller: true Kind: VirtualMachine Name: leap-153-b UID: f124275d-2f50-4a58-b120-7a382469cb7a Resource Version: 17055968 UID: a230edc8-8a9b-4ccc-85ba-ac4d4e1a1079 Spec: Domain: Cpu: Cores: 2 Sockets: 1 Threads: 1 Devices: Disks: Boot Order: 1 Disk: Bus: virtio Name: disk-1 Boot Order: 2 Cdrom: Bus: sata Readonly: true Tray: closed Name: disk-0 Disk: Bus: virtio Name: cloudinitdisk Interfaces: Masquerade: Model: virtio Name: default Features: Acpi: Enabled: true Firmware: Uuid: 7e1aec8b-52b2-51ee-99e9-69890e3bd924 Machine: Type: q35 Memory: Guest: 1948Mi Resources: Limits: Cpu: 2 Memory: 2Gi Requests: Cpu: 125m Memory: 1365Mi Eviction Strategy: LiveMigrate Hostname: leap-153-b Networks: Name: default Pod: Volumes: Name: disk-1 Persistent Volume Claim: Claim Name: leap-153-b-disk-1-3gt8w Name: disk-0 Persistent Volume Claim: Claim Name: leap-153-b-disk-0-mzlyf Cloud Init No Cloud: Network Data Secret Ref: Name: leap-153-b-rsi0m Secret Ref: Name: leap-153-b-rsi0m Name: cloudinitdisk Status: Active Pods: 7dee95b6-d4d0-411f-b3f6-7b730f8102ba: node21 Conditions: Last Probe Time: 2022-04-16T17:01:40Z Last Transition Time: 2022-04-16T17:01:40Z Message: Guest VM is not reported as running Reason: GuestNotRunning Status: False Type: Ready Guest OS Info: Phase: Scheduling Phase Transition Timestamps: Phase: Pending Phase Transition Timestamp: 2022-04-16T17:01:40Z Phase: Scheduling Phase Transition Timestamp: 2022-04-16T17:01:40Z Qos Class: Burstable Virtual Machine Revision Name: revision-start-vm-f124275d-2f50-4a58-b120-7a382469cb7a-3 Events: Type Reason Age From Message


Normal SuccessfulCreate 17s disruptionbudget-controller Created PodDisruptionBudget kubevirt-disruption-budget-tqkn8 Normal SuccessfulCreate 17s virtualmachine-controller Created virtual machine pod virt-launcher-leap-153-b-xccgk

noahgildersleeve commented 2 years ago

@pagong Could you attach a support bundle to the ticket when you get a chance? You can access it by clicking the support link in the bottom left of the UI, then clicking generate support bundle.

pagong commented 2 years ago

You're welcome. Here it is:

supportbundle_a4af19a6-0e0a-4840-974b-6af7160f7b63_2022-04-16T11-34-36Z.zip

kylechase commented 2 years ago

I am having the same issue. Please let me know if you need further information.

pagong commented 2 years ago

Another attempt at starting a Talos VM:

node23:~ # kubectl get pod NAME READY STATUS RESTARTS AGE virt-launcher-talos-c1-m2-mpcj2 0/1 ContainerCreating 0 4m59s

node23:~ # kubectl describe pod virt-launcher-talos-c1-m2-mpcj2 Name: virt-launcher-talos-c1-m2-mpcj2 Namespace: default Priority: 0 Node: node22/192.168.20.122 Start Time: Sat, 23 Apr 2022 16:16:13 +0000 Labels: harvesterhci.io/vmName=talos-c1-m2 kubevirt.io=virt-launcher kubevirt.io/created-by=00bb9550-43d6-436c-8493-ce10052851b4 Annotations: harvesterhci.io/sshNames: [] kubernetes.io/psp: global-unrestricted-psp kubevirt.io/domain: talos-c1-m2 post.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--unfreeze", "--name", "talos-c1-m2", "--namespace", "default"] post.hook.backup.velero.io/container: compute pre.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--freeze", "--name", "talos-c1-m2", "--namespace", "default"] pre.hook.backup.velero.io/container: compute traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0 Status: Pending IP:
IPs: Controlled By: VirtualMachineInstance/talos-c1-m2 Containers: compute: Container ID:
Image: registry.suse.com/suse/sles/15.3/virt-launcher:0.45.0-8.4.3 Image ID:
Port: Host Port: Command: /usr/bin/virt-launcher --qemu-timeout 340s --name talos-c1-m2 --uid 00bb9550-43d6-436c-8493-ce10052851b4 --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/OVMF State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 2 devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 memory: 2336925697 Requests: cpu: 125m devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 ephemeral-storage: 50M memory: 1620748289 Environment: POD_NAME: virt-launcher-talos-c1-m2-mpcj2 (v1:metadata.name) Mounts: /var/run/kubevirt from public (rw) /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw) /var/run/kubevirt-private from private (rw) /var/run/kubevirt-private/secret/cloudinitdisk/networkData from cloudinitdisk-ndata (ro,path="networkData") /var/run/kubevirt-private/secret/cloudinitdisk/networkdata from cloudinitdisk-ndata (ro,path="networkdata") /var/run/kubevirt-private/secret/cloudinitdisk/userData from cloudinitdisk-udata (ro,path="userData") /var/run/kubevirt-private/secret/cloudinitdisk/userdata from cloudinitdisk-udata (ro,path="userdata") /var/run/kubevirt/container-disks from container-disks (rw) /var/run/kubevirt/hotplug-disks from hotplug-disks (rw) /var/run/kubevirt/sockets from sockets (rw) /var/run/libvirt from libvirt-runtime (rw) Devices: /dev/disk-0 from disk-0 /dev/disk-1 from disk-1 Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: private: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: public: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: sockets: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: disk-0: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: talos-c1-m2-disk-0-sfpnk ReadOnly: false disk-1: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: talos-c1-m2-disk-1-ab3p6 ReadOnly: false cloudinitdisk-udata: Type: Secret (a volume populated by a Secret) SecretName: talos-c1-m2-qph87 Optional: false cloudinitdisk-ndata: Type: Secret (a volume populated by a Secret) SecretName: talos-c1-m2-qph87 Optional: false virt-bin-share-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: libvirt-runtime: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: ephemeral-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: container-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: hotplug-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: QoS Class: Burstable Node-Selectors: kubevirt.io/schedulable=true Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Warning FailedMount 6m5s kubelet MountVolume.SetUp failed for volume "cloudinitdisk-ndata" : failed to sync secret cache: timed out waiting for the condition Warning FailedMount 6m5s kubelet MountVolume.SetUp failed for volume "cloudinitdisk-udata" : failed to sync secret cache: timed out waiting for the condition Normal SuccessfulMountVolume 5m51s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a/dev" Normal SuccessfulMountVolume 5m51s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a" volumeMapPath "/var/lib/kubelet/pods/48ec7f65-3a36-402a-99d9-d35c41a4449c/volumeDevices/kubernetes.io~csi" Normal Scheduled 5m21s default-scheduler Successfully assigned default/virt-launcher-talos-c1-m2-mpcj2 to node22 Warning FailedAttachVolume 5m15s (x4 over 5m21s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a" : rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [message=EOF, code=Server Error, detail=] from [http://longhorn-backend:9500/v1/volumes/pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a?action=attach] Warning FailedAttachVolume 5m11s (x5 over 5m21s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-878ba48e-e39f-4050-aa9b-d12de40363ff" : rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [code=Server Error, detail=, message=EOF] from [http://longhorn-backend:9500/v1/volumes/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff?action=attach] Normal SuccessfulAttachVolume 5m11s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a" Normal SuccessfulAttachVolume 5m2s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-878ba48e-e39f-4050-aa9b-d12de40363ff" Warning FailedMount 4m3s kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata disk-0 hotplug-disks cloudinitdisk-udata container-disks public ephemeral-disks libvirt-runtime sockets private disk-1]: timed out waiting for the condition Warning FailedMount 106s kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[sockets cloudinitdisk-ndata ephemeral-disks libvirt-runtime container-disks cloudinitdisk-udata public hotplug-disks disk-0 disk-1 private]: timed out waiting for the condition Warning FailedMapVolume 84s (x10 over 5m35s) kubelet MapVolume.MapPodDevice failed for volume "pvc-878ba48e-e39f-4050-aa9b-d12de40363ff" : rpc error: code = Internal desc = Could not mount "/dev/longhorn/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff" at "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff/48ec7f65-3a36-402a-99d9-d35c41a4449c": mount failed: exit status 32 Mounting command: mount Mounting arguments: -o bind /dev/longhorn/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff/48ec7f65-3a36-402a-99d9-d35c41a4449c Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff/48ec7f65-3a36-402a-99d9-d35c41a4449c: special device /dev/longhorn/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff does not exist.

node23:~ # ls -l /dev/longhorn/ ls: cannot access '/dev/longhorn/': No such file or directory

node22:~ # ls -l /dev/longhorn/ total 0 brw-rw---- 1 root root 8, 48 Apr 23 16:16 pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a brw-rw---- 1 root root 8, 32 Apr 23 15:53 pvc-d2a4d8d5-0f19-4dc2-bcd0-f68c2656c762

node21:~ # ls -l /dev/longhorn/ total 0 brw-rw---- 1 root root 8, 32 Apr 23 15:33 pvc-6a47d4cd-c04a-4a67-906c-3016c54e7fae brw-rw---- 1 root root 8, 48 Apr 23 16:17 pvc-878ba48e-e39f-4050-aa9b-d12de40363ff

pagong commented 2 years ago

So the pod for the VM got assigned to node22. The volume for disk-0 (cdrom) was also assigned to node22. But the volume for disk-1 (rootfs) was assigned to node21!

That's the bug! I think that the VM launcher should assign all resources to the same node, however.

Is it possible to force longhorn to move the disk-1 device from node21 to node22?

PhanLe1010 commented 2 years ago

The volume for disk-0 (cdrom) was also assigned to node22.

@pagong Can you provide the YAML file for the pod as well as the PVC of the VM?

pagong commented 2 years ago

The volume for disk-0 (cdrom) was also assigned to node22.

@pagong Can you provide the YAML file for the pod as well as the PVC of the VM?

@PhanLe1010 I used the Harvester GUI to create the VM. Can you tell me where to look for that YAML file?

PhanLe1010 commented 2 years ago

Thanks @pagong No worries, we got it from the attached support bundle.

In Harvester, I remember that there is a special volume mode design specially for Havarvester VM which migratable: "true". I am guessing all PVC used by Harvester should use a storageclass that have this parameter. However, looking at the yaml in the ticket, there is one PVC that is using a regular longhorn storageclass that doesnt have this parameter. It is PVC is leap-153-b-disk-1-3gt8w with corresponding volume ispvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a . This volume is exactly the volume that is having the mounting problem. I am thinking if something went wrong that sets wrong setting the wrong storageclass (longhorn) for this PVC. Please see objects in attached yaml

volume pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24

# PVC
    name: leap-153-b-disk-0-mzlyf
    namespace: default
    resourceVersion: "16832466"
    uid: 8bd3ddab-cb45-4d76-b13b-a0f417b56c24
  spec:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 1Gi
    storageClassName: longhorn-image-g2vtx
    volumeMode: Block
    volumeName: pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24

# STORAGE CLASS
    name: longhorn-image-g2vtx
    resourceVersion: "16829496"
    uid: eb135c36-de88-403d-b1f5-837dc3dfff96
  parameters:
    backingImage: default-image-g2vtx
    migratable: "true"
    numberOfReplicas: "3"
    staleReplicaTimeout: "30"

pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a

# PVC: leap-153-b-disk-1-3gt8w
  spec:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 20Gi
    storageClassName: longhorn
    volumeMode: Block
    volumeName: pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a
#STORAGECLASS
    name: longhorn
    resourceVersion: "11755436"
    uid: 62796ee7-c08f-43dc-933e-0b89b75e6421
  parameters:
    fromBackup: "null"
    fsType: ext4
    numberOfReplicas: "3"
    staleReplicaTimeout: "30"