[BUG] VMs do not start due to mounting errors of root disk

pagong commented 2 years ago

Describe the bug I'm trying to install VMs from ISO images. I've tried both, openSUE-leap15.3 and Talos v1.0.0

To Reproduce

Use Harvester GUI to create VM with 2 volumes: 1st one (1GB) for cdrom of ISO image 2nd one (20GB) for root FS of VM
Guest VM stays in state "starting" with message "Guest VM is not reported as running "

Expected behavior Guest VMs should startup properly.

Support bundle

Environment:

Harvester ISO version: V1.0.0
Underlying Infrastructure: Bare metal on 3 HP-Compaq Elite 8300 workstations

Additional context Output of "kubectl describe pod virt-launcher-leap-153-b-682gs" ... Events: Type Reason Age From Message

Warning FailedScheduling 3m11s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. Warning FailedScheduling 3m10s default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. Normal Scheduled 2m50s default-scheduler Successfully assigned default/virt-launcher-leap-153-b-682gs to node21 Normal SuccessfulAttachVolume 2m40s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" Normal SuccessfulAttachVolume 2m28s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" Normal SuccessfulMountVolume 2m19s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24/dev" Normal SuccessfulMountVolume 2m19s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" volumeMapPath "/var/lib/kubelet/pods/fd9bac50-8c95-4871-9503-a3ff390f019a/volumeDevices/kubernetes.io~csi" Warning FailedMount 48s kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata libvirt-runtime disk-1 disk-0 cloudinitdisk-udata private hotplug-disks sockets public ephemeral-disks container-disks]: timed out waiting for the condition Warning FailedMapVolume 27s (x9 over 2m35s) kubelet MapVolume.MapPodDevice failed for volume "pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" : rpc error: code = Internal desc = Could not mount "/dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" at "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a": mount failed: exit status 32 Mounting command: mount Mounting arguments: -o bind /dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a: special device /dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a does not exist.

node21:~ # kubectl get pv -o wide NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE VOLUMEMODE pvc-6a47d4cd-c04a-4a67-906c-3016c54e7fae 10Mi RWO Delete Bound cattle-monitoring-system/rancher-monitoring-grafana longhorn 21d Filesystem pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24 1Gi RWX Delete Bound default/leap-153-b-disk-0-mzlyf longhorn-image-g2vtx 3m Block pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a 20Gi RWX Delete Bound default/leap-153-b-disk-1-3gt8w longhorn 3m2s Block pvc-d2a4d8d5-0f19-4dc2-bcd0-f68c2656c762 50Gi RWO Delete Bound cattle-monitoring-system/prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0 longhorn 21d Filesystem

node21:~ # kubectl get pvc -A -o wide NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE cattle-monitoring-system prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0 Bound pvc-d2a4d8d5-0f19-4dc2-bcd0-f68c2656c762 50Gi RWO longhorn 21d Filesystem cattle-monitoring-system rancher-monitoring-grafana Bound pvc-6a47d4cd-c04a-4a67-906c-3016c54e7fae 10Mi RWO longhorn 21d Filesystem default leap-153-b-disk-0-mzlyf Bound pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24 1Gi RWX longhorn-image-g2vtx 3m46s Block default leap-153-b-disk-1-3gt8w Bound pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a 20Gi RWX longhorn 3m46s Block

pagong commented 2 years ago

node21:~ # kubectl describe pod virt-launcher-leap-153-b-682gs Name: virt-launcher-leap-153-b-682gs Namespace: default Priority: 0 Node: node21/192.168.20.121 Start Time: Sat, 16 Apr 2022 11:05:24 +0000 Labels: harvesterhci.io/vmName=leap-153-b kubevirt.io=virt-launcher kubevirt.io/created-by=9782b91b-a590-4ce6-b7c3-a4f98be9351d Annotations: harvesterhci.io/sshNames: [] kubernetes.io/psp: global-unrestricted-psp kubevirt.io/domain: leap-153-b post.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--unfreeze", "--name", "leap-153-b", "--namespace", "default"] post.hook.backup.velero.io/container: compute pre.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--freeze", "--name", "leap-153-b", "--namespace", "default"] pre.hook.backup.velero.io/container: compute traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0 Status: Pending IP:
IPs: Controlled By: VirtualMachineInstance/leap-153-b Containers: compute: Container ID:
Image: registry.suse.com/suse/sles/15.3/virt-launcher:0.45.0-8.4.3 Image ID:
Port: Host Port: Command: /usr/bin/virt-launcher --qemu-timeout 314s --name leap-153-b --uid 9782b91b-a590-4ce6-b7c3-a4f98be9351d --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/OVMF State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 2 devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 memory: 2336925697 Requests: cpu: 125m devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 ephemeral-storage: 50M memory: 1620748289 Environment: POD_NAME: virt-launcher-leap-153-b-682gs (v1:metadata.name) Mounts: /var/run/kubevirt from public (rw) /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw) /var/run/kubevirt-private from private (rw) /var/run/kubevirt-private/secret/cloudinitdisk/networkData from cloudinitdisk-ndata (ro,path="networkData") /var/run/kubevirt-private/secret/cloudinitdisk/networkdata from cloudinitdisk-ndata (ro,path="networkdata") /var/run/kubevirt-private/secret/cloudinitdisk/userData from cloudinitdisk-udata (ro,path="userData") /var/run/kubevirt-private/secret/cloudinitdisk/userdata from cloudinitdisk-udata (ro,path="userdata") /var/run/kubevirt/container-disks from container-disks (rw) /var/run/kubevirt/hotplug-disks from hotplug-disks (rw) /var/run/kubevirt/sockets from sockets (rw) /var/run/libvirt from libvirt-runtime (rw) Devices: /dev/disk-0 from disk-0 /dev/disk-1 from disk-1 Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: private: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: public: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: sockets: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: disk-1: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: leap-153-b-disk-1-3gt8w ReadOnly: false disk-0: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: leap-153-b-disk-0-mzlyf ReadOnly: false cloudinitdisk-udata: Type: Secret (a volume populated by a Secret) SecretName: leap-153-b-rsi0m Optional: false cloudinitdisk-ndata: Type: Secret (a volume populated by a Secret) SecretName: leap-153-b-rsi0m Optional: false virt-bin-share-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: libvirt-runtime: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: ephemeral-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: container-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: hotplug-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: QoS Class: Burstable Node-Selectors: kubevirt.io/schedulable=true Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message

Warning FailedScheduling Normal Scheduled Warning FailedScheduling Normal SuccessfulAttachVolume Normal SuccessfulAttachVolume Normal SuccessfulMountVolume Normal SuccessfulMountVolume Warning FailedMount Warning FailedMount Warning FailedMount Warning FailedMount Warning FailedMount Warning FailedMount Warning FailedMount Warning FailedMount Warning FailedMount Warning FailedMapVolume Mounting command: mount Mounting arguments: Output: mount: Warning FailedMount 46m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. 45m default-scheduler Successfully assigned default/virt-launcher-leap-153-b-682gs to node21 46m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims. 45m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" 45m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" 45m kubelet MapVolume.MapPodDevice succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24/dev" 45m kubelet MapVolume.MapPodDevice succeeded for volume "pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24" volumeMapPath "/var/lib/kubelet/pods/fd9bac50-8c95-4871-9503-a3ff390f019a/volumeDevices/kubernetes.io~csi" 43m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata libvirt-runtime disk-1 disk-0 cloudinitdisk-udata private hotplug-disks sockets public ephemeral-disks container-disks]: timed out waiting for the condition 41m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[container-disks cloudinitdisk-udata ephemeral-disks disk-1 disk-0 sockets public hotplug-disks cloudinitdisk-ndata private libvirt-runtime]: timed out waiting for the condition 39m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[disk-1 disk-0 hotplug-disks libvirt-runtime sockets cloudinitdisk-udata ephemeral-disks container-disks cloudinitdisk-ndata private public]: timed out waiting for the condition 37m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata container-disks disk-0 public hotplug-disks libvirt-runtime sockets cloudinitdisk-udata disk-1 ephemeral-disks private]: timed out waiting for the condition 34m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata disk-1 private ephemeral-disks container-disks hotplug-disks public libvirt-runtime sockets cloudinitdisk-udata disk-0]: timed out waiting for the condition 32m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata public container-disks disk-1 private ephemeral-disks cloudinitdisk-udata hotplug-disks libvirt-runtime sockets disk-0]: timed out waiting for the condition 30m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[public container-disks hotplug-disks cloudinitdisk-ndata ephemeral-disks disk-0 private sockets cloudinitdisk-udata libvirt-runtime disk-1]: timed out waiting for the condition 27m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[ephemeral-disks libvirt-runtime cloudinitdisk-udata disk-1 sockets public container-disks hotplug-disks cloudinitdisk-ndata disk-0 private]: timed out waiting for the condition 25m kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[libvirt-runtime sockets cloudinitdisk-udata ephemeral-disks cloudinitdisk-ndata disk-0 public container-disks private hotplug-disks disk-1]: timed out waiting for the condition 15m (x23 over 45m) kubelet MapVolume.MapPodDevice failed for volume "pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" : rpc error: code = Internal desc = Could not mount "/dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a" at "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a": mount failed: exit status 32 -o bind /dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a/fd9bac50-8c95-4871-9503-a3ff390f019a: special device /dev/longhorn/pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a does not exist. 5m18s (x9 over 23m) kubelet (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[public sockets cloudinitdisk-ndata ephemeral-disks libvirt-runtime private container-disks hotplug-disks cloudinitdisk-udata disk-0 disk-1]: timed out waiting for the condition

pagong commented 2 years ago

node21:~ # kubectl describe VirtualMachineInstance/leap-153-b
Name: leap-153-b Namespace: default Labels: harvesterhci.io/vmName=leap-153-b Annotations: harvesterhci.io/sshNames: [] kubevirt.io/latest-observed-api-version: v1 kubevirt.io/storage-observed-api-version: v1alpha3 API Version: kubevirt.io/v1 Kind: VirtualMachineInstance Metadata: Creation Timestamp: 2022-04-16T17:01:40Z Finalizers: kubevirt.io/virtualMachineControllerFinalize foregroundDeleteVirtualMachine wrangler.cattle.io/VMIController.UnsetOwnerOfPVCs Generation: 4 Managed Fields: API Version: kubevirt.io/v1alpha3 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:harvesterhci.io/sshNames: f:kubevirt.io/latest-observed-api-version: f:kubevirt.io/storage-observed-api-version: f:finalizers: .: v:"kubevirt.io/virtualMachineControllerFinalize": f:labels: .: f:harvesterhci.io/vmName: f:ownerReferences: .: k:{"uid":"f124275d-2f50-4a58-b120-7a382469cb7a"}: .: f:apiVersion: f:blockOwnerDeletion: f:controller: f:kind: f:name: f:uid: f:spec: .: f:domain: .: f:cpu: .: f:cores: f:sockets: f:threads: f:devices: .: f:disks: f:interfaces: f:firmware: .: f:uuid: f:machine: .: f:type: f:memory: .: f:guest: f:resources: .: f:limits: .: f:cpu: f:memory: f:requests: .: f:cpu: f:memory: f:evictionStrategy: f:hostname: f:networks: f:volumes: f:status: .: f:activePods: .: f:7dee95b6-d4d0-411f-b3f6-7b730f8102ba: f:conditions: f:guestOSInfo: f:phase: f:phaseTransitionTimestamps: f:qosClass: f:virtualMachineRevisionName: Manager: Go-http-client Operation: Update Time: 2022-04-16T17:01:40Z API Version: kubevirt.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: v:"wrangler.cattle.io/VMIController.UnsetOwnerOfPVCs": Manager: harvester Operation: Update Time: 2022-04-16T17:01:40Z Owner References: API Version: kubevirt.io/v1 Block Owner Deletion: true Controller: true Kind: VirtualMachine Name: leap-153-b UID: f124275d-2f50-4a58-b120-7a382469cb7a Resource Version: 17055968 UID: a230edc8-8a9b-4ccc-85ba-ac4d4e1a1079 Spec: Domain: Cpu: Cores: 2 Sockets: 1 Threads: 1 Devices: Disks: Boot Order: 1 Disk: Bus: virtio Name: disk-1 Boot Order: 2 Cdrom: Bus: sata Readonly: true Tray: closed Name: disk-0 Disk: Bus: virtio Name: cloudinitdisk Interfaces: Masquerade: Model: virtio Name: default Features: Acpi: Enabled: true Firmware: Uuid: 7e1aec8b-52b2-51ee-99e9-69890e3bd924 Machine: Type: q35 Memory: Guest: 1948Mi Resources: Limits: Cpu: 2 Memory: 2Gi Requests: Cpu: 125m Memory: 1365Mi Eviction Strategy: LiveMigrate Hostname: leap-153-b Networks: Name: default Pod: Volumes: Name: disk-1 Persistent Volume Claim: Claim Name: leap-153-b-disk-1-3gt8w Name: disk-0 Persistent Volume Claim: Claim Name: leap-153-b-disk-0-mzlyf Cloud Init No Cloud: Network Data Secret Ref: Name: leap-153-b-rsi0m Secret Ref: Name: leap-153-b-rsi0m Name: cloudinitdisk Status: Active Pods: 7dee95b6-d4d0-411f-b3f6-7b730f8102ba: node21 Conditions: Last Probe Time: 2022-04-16T17:01:40Z Last Transition Time: 2022-04-16T17:01:40Z Message: Guest VM is not reported as running Reason: GuestNotRunning Status: False Type: Ready Guest OS Info: Phase: Scheduling Phase Transition Timestamps: Phase: Pending Phase Transition Timestamp: 2022-04-16T17:01:40Z Phase: Scheduling Phase Transition Timestamp: 2022-04-16T17:01:40Z Qos Class: Burstable Virtual Machine Revision Name: revision-start-vm-f124275d-2f50-4a58-b120-7a382469cb7a-3 Events: Type Reason Age From Message

Normal SuccessfulCreate 17s disruptionbudget-controller Created PodDisruptionBudget kubevirt-disruption-budget-tqkn8 Normal SuccessfulCreate 17s virtualmachine-controller Created virtual machine pod virt-launcher-leap-153-b-xccgk

noahgildersleeve commented 2 years ago

@pagong Could you attach a support bundle to the ticket when you get a chance? You can access it by clicking the support link in the bottom left of the UI, then clicking generate support bundle.

pagong commented 2 years ago

You're welcome. Here it is:

supportbundle_a4af19a6-0e0a-4840-974b-6af7160f7b63_2022-04-16T11-34-36Z.zip

kylechase commented 2 years ago

I am having the same issue. Please let me know if you need further information.

pagong commented 2 years ago

Another attempt at starting a Talos VM:

node23:~ # kubectl get pod NAME READY STATUS RESTARTS AGE virt-launcher-talos-c1-m2-mpcj2 0/1 ContainerCreating 0 4m59s

node23:~ # kubectl describe pod virt-launcher-talos-c1-m2-mpcj2 Name: virt-launcher-talos-c1-m2-mpcj2 Namespace: default Priority: 0 Node: node22/192.168.20.122 Start Time: Sat, 23 Apr 2022 16:16:13 +0000 Labels: harvesterhci.io/vmName=talos-c1-m2 kubevirt.io=virt-launcher kubevirt.io/created-by=00bb9550-43d6-436c-8493-ce10052851b4 Annotations: harvesterhci.io/sshNames: [] kubernetes.io/psp: global-unrestricted-psp kubevirt.io/domain: talos-c1-m2 post.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--unfreeze", "--name", "talos-c1-m2", "--namespace", "default"] post.hook.backup.velero.io/container: compute pre.hook.backup.velero.io/command: ["/usr/bin/virt-freezer", "--freeze", "--name", "talos-c1-m2", "--namespace", "default"] pre.hook.backup.velero.io/container: compute traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0 Status: Pending IP:
IPs: Controlled By: VirtualMachineInstance/talos-c1-m2 Containers: compute: Container ID:
Image: registry.suse.com/suse/sles/15.3/virt-launcher:0.45.0-8.4.3 Image ID:
Port: Host Port: Command: /usr/bin/virt-launcher --qemu-timeout 340s --name talos-c1-m2 --uid 00bb9550-43d6-436c-8493-ce10052851b4 --namespace default --kubevirt-share-dir /var/run/kubevirt --ephemeral-disk-dir /var/run/kubevirt-ephemeral-disks --container-disk-dir /var/run/kubevirt/container-disks --grace-period-seconds 45 --hook-sidecars 0 --ovmf-path /usr/share/OVMF State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 2 devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 memory: 2336925697 Requests: cpu: 125m devices.kubevirt.io/kvm: 1 devices.kubevirt.io/tun: 1 devices.kubevirt.io/vhost-net: 1 ephemeral-storage: 50M memory: 1620748289 Environment: POD_NAME: virt-launcher-talos-c1-m2-mpcj2 (v1:metadata.name) Mounts: /var/run/kubevirt from public (rw) /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw) /var/run/kubevirt-private from private (rw) /var/run/kubevirt-private/secret/cloudinitdisk/networkData from cloudinitdisk-ndata (ro,path="networkData") /var/run/kubevirt-private/secret/cloudinitdisk/networkdata from cloudinitdisk-ndata (ro,path="networkdata") /var/run/kubevirt-private/secret/cloudinitdisk/userData from cloudinitdisk-udata (ro,path="userData") /var/run/kubevirt-private/secret/cloudinitdisk/userdata from cloudinitdisk-udata (ro,path="userdata") /var/run/kubevirt/container-disks from container-disks (rw) /var/run/kubevirt/hotplug-disks from hotplug-disks (rw) /var/run/kubevirt/sockets from sockets (rw) /var/run/libvirt from libvirt-runtime (rw) Devices: /dev/disk-0 from disk-0 /dev/disk-1 from disk-1 Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: private: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: public: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: sockets: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: disk-0: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: talos-c1-m2-disk-0-sfpnk ReadOnly: false disk-1: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: talos-c1-m2-disk-1-ab3p6 ReadOnly: false cloudinitdisk-udata: Type: Secret (a volume populated by a Secret) SecretName: talos-c1-m2-qph87 Optional: false cloudinitdisk-ndata: Type: Secret (a volume populated by a Secret) SecretName: talos-c1-m2-qph87 Optional: false virt-bin-share-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: libvirt-runtime: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: ephemeral-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: container-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: hotplug-disks: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium:
SizeLimit: QoS Class: Burstable Node-Selectors: kubevirt.io/schedulable=true Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message

Warning FailedMount 6m5s kubelet MountVolume.SetUp failed for volume "cloudinitdisk-ndata" : failed to sync secret cache: timed out waiting for the condition Warning FailedMount 6m5s kubelet MountVolume.SetUp failed for volume "cloudinitdisk-udata" : failed to sync secret cache: timed out waiting for the condition Normal SuccessfulMountVolume 5m51s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a" globalMapPath "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a/dev" Normal SuccessfulMountVolume 5m51s kubelet MapVolume.MapPodDevice succeeded for volume "pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a" volumeMapPath "/var/lib/kubelet/pods/48ec7f65-3a36-402a-99d9-d35c41a4449c/volumeDevices/kubernetes.io~csi" Normal Scheduled 5m21s default-scheduler Successfully assigned default/virt-launcher-talos-c1-m2-mpcj2 to node22 Warning FailedAttachVolume 5m15s (x4 over 5m21s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a" : rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [message=EOF, code=Server Error, detail=] from [http://longhorn-backend:9500/v1/volumes/pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a?action=attach] Warning FailedAttachVolume 5m11s (x5 over 5m21s) attachdetach-controller AttachVolume.Attach failed for volume "pvc-878ba48e-e39f-4050-aa9b-d12de40363ff" : rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [code=Server Error, detail=, message=EOF] from [http://longhorn-backend:9500/v1/volumes/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff?action=attach] Normal SuccessfulAttachVolume 5m11s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a" Normal SuccessfulAttachVolume 5m2s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-878ba48e-e39f-4050-aa9b-d12de40363ff" Warning FailedMount 4m3s kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[cloudinitdisk-ndata disk-0 hotplug-disks cloudinitdisk-udata container-disks public ephemeral-disks libvirt-runtime sockets private disk-1]: timed out waiting for the condition Warning FailedMount 106s kubelet Unable to attach or mount volumes: unmounted volumes=[disk-1], unattached volumes=[sockets cloudinitdisk-ndata ephemeral-disks libvirt-runtime container-disks cloudinitdisk-udata public hotplug-disks disk-0 disk-1 private]: timed out waiting for the condition Warning FailedMapVolume 84s (x10 over 5m35s) kubelet MapVolume.MapPodDevice failed for volume "pvc-878ba48e-e39f-4050-aa9b-d12de40363ff" : rpc error: code = Internal desc = Could not mount "/dev/longhorn/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff" at "/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff/48ec7f65-3a36-402a-99d9-d35c41a4449c": mount failed: exit status 32 Mounting command: mount Mounting arguments: -o bind /dev/longhorn/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff/48ec7f65-3a36-402a-99d9-d35c41a4449c Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff/48ec7f65-3a36-402a-99d9-d35c41a4449c: special device /dev/longhorn/pvc-878ba48e-e39f-4050-aa9b-d12de40363ff does not exist.

node23:~ # ls -l /dev/longhorn/ ls: cannot access '/dev/longhorn/': No such file or directory

node22:~ # ls -l /dev/longhorn/ total 0 brw-rw---- 1 root root 8, 48 Apr 23 16:16 pvc-b47271ea-4dc1-4e42-9430-b2fd8e30864a brw-rw---- 1 root root 8, 32 Apr 23 15:53 pvc-d2a4d8d5-0f19-4dc2-bcd0-f68c2656c762

node21:~ # ls -l /dev/longhorn/ total 0 brw-rw---- 1 root root 8, 32 Apr 23 15:33 pvc-6a47d4cd-c04a-4a67-906c-3016c54e7fae brw-rw---- 1 root root 8, 48 Apr 23 16:17 pvc-878ba48e-e39f-4050-aa9b-d12de40363ff

pagong commented 2 years ago

So the pod for the VM got assigned to node22. The volume for disk-0 (cdrom) was also assigned to node22. But the volume for disk-1 (rootfs) was assigned to node21!

That's the bug! I think that the VM launcher should assign all resources to the same node, however.

Is it possible to force longhorn to move the disk-1 device from node21 to node22?

PhanLe1010 commented 2 years ago

The volume for disk-0 (cdrom) was also assigned to node22.

@pagong Can you provide the YAML file for the pod as well as the PVC of the VM?

pagong commented 2 years ago

The volume for disk-0 (cdrom) was also assigned to node22.

@pagong Can you provide the YAML file for the pod as well as the PVC of the VM?

@PhanLe1010 I used the Harvester GUI to create the VM. Can you tell me where to look for that YAML file?

PhanLe1010 commented 2 years ago

Thanks @pagong No worries, we got it from the attached support bundle.

In Harvester, I remember that there is a special volume mode design specially for Havarvester VM which migratable: "true". I am guessing all PVC used by Harvester should use a storageclass that have this parameter. However, looking at the yaml in the ticket, there is one PVC that is using a regular longhorn storageclass that doesnt have this parameter. It is PVC is leap-153-b-disk-1-3gt8w with corresponding volume ispvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a . This volume is exactly the volume that is having the mounting problem. I am thinking if something went wrong that sets wrong setting the wrong storageclass (longhorn) for this PVC. Please see objects in attached yaml

volume pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24

# PVC
    name: leap-153-b-disk-0-mzlyf
    namespace: default
    resourceVersion: "16832466"
    uid: 8bd3ddab-cb45-4d76-b13b-a0f417b56c24
  spec:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 1Gi
    storageClassName: longhorn-image-g2vtx
    volumeMode: Block
    volumeName: pvc-8bd3ddab-cb45-4d76-b13b-a0f417b56c24

# STORAGE CLASS
    name: longhorn-image-g2vtx
    resourceVersion: "16829496"
    uid: eb135c36-de88-403d-b1f5-837dc3dfff96
  parameters:
    backingImage: default-image-g2vtx
    migratable: "true"
    numberOfReplicas: "3"
    staleReplicaTimeout: "30"

pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a

# PVC: leap-153-b-disk-1-3gt8w
  spec:
    accessModes:
    - ReadWriteMany
    resources:
      requests:
        storage: 20Gi
    storageClassName: longhorn
    volumeMode: Block
    volumeName: pvc-b5af1813-3bf1-4e36-b9fc-3118e6c0897a
#STORAGECLASS
    name: longhorn
    resourceVersion: "11755436"
    uid: 62796ee7-c08f-43dc-933e-0b89b75e6421
  parameters:
    fromBackup: "null"
    fsType: ext4
    numberOfReplicas: "3"
    staleReplicaTimeout: "30"

harvester / harvester

[BUG] VMs do not start due to mounting errors of root disk #2156