vmware / cloud-director-named-disk-csi-driver

Container Storage Interface (CSI) driver for VMware Cloud Director
Other
27 stars 31 forks source link

Attaching a disk uses nodeID to find the VM, which fails if hostname in cluster differs from VM name in VMware #255

Open erSitzt opened 10 months ago

erSitzt commented 10 months ago

Describe the bug

Hostname of k8s nodes is a fqdn, VM name in VMware is the short name Attaching the volume fails, because the VM to attach the volume to is not found.

AttachVolume.Attach failed for volume "pvc-7bbe77fc-bcbb-4413-8b0e-443acb33bb4a" : rpc error: code = Unknown desc = unable to find VM for node [exp-k8s-prod-worker-0001.mydomain.com]: [unable to find vm [exp-k8s-prod-worker-0001.mydomain.com] in vApp [k8s_prod]: [[ENF] entity not found]]

The VM name is just exp-k8s-prod-worker-0001

The VM lookup seems to just use the NodeID

nodeID := req.GetNodeId()

vdcManager.FindVMByName(cs.VAppName, nodeID)

Reproduction steps

Use different k8s node name and vm name

Expected behavior

something else :)

Additional context

No response

arunmk commented 10 months ago

@erSitzt this is a proper concern. There is no easy way to find a VM and the only way seems to be to iterate through all VMs and then get the right guest OS name.

How are you creating VMs with different VMs? Do you use CAPVCD or CSE? Could you explain your cluster creation process.

erSitzt commented 10 months ago

maybe @Vivida1 can elaborate...

But at last in my environment i can get the vms UUID in the guest with something like dmidecode

ubuntu@rke2-zentrale-infra-agent-1:~$ sudo dmidecode | grep VMware
        Manufacturer: VMware, Inc.
        Product Name: VMware Virtual Platform
        Serial Number: VMware-42 21 de 28 b7 d6 76 25-ea ad 43 3c 7c e4 45 ec
        Description: VMware SVGA II

And in vsphere ( using powerCLI to test it )

get-vm rke2-zentrale-infra-agent-1 |
Select Name,
@{N='UUID';E={$_.ExtensionData.Config.Uuid}}

Name                        UUID                                
----                        ----                                
rke2-zentrale-infra-agent-1 4221de28-b7d6-7625-eaad-433c7ce445ec

Not sure if that works everywhere, but looks to be a better match than hoping for identical names.

P.S. had to edit.. dmidecode UUID does not match completely, but VMware Serial Number does.

erSitzt commented 10 months ago

And even if this is very old...

https://github.com/kubernetes/kubernetes/pull/59519

This looks like the UUID / VMware Serial Number is used when provisioning VMs. I did not check if its still the same logic, but if it is... This is the way :)

Vivida1 commented 10 months ago

How are you creating VMs with different VMs? Do you use CAPVCD or CSE? Could you explain your cluster creation process.

Combination of Terraform and Ansible (+RKE2). Obviously VM names are defined in Terraform and K8s node names are defined inside the Ansible Playbook. In our case the VM names equal their hostnames, the K8s node names equal their FQDN.

As a workaround we changed the K8s Node Names to the VM Hostname instead of its FQDN.

But of course it would be much nicer if the CSI can derive the VM name instead of just assuming it to be the K8s node name. If it cannot derive the VM name, it still can fallback to using the Node ID.

arunmk commented 9 months ago

Thanks @Vivida1. I will check about how to get the hostname set within the guest by querying the VM properties.

If there is no clear way to do so, will you be okay with setting some guestinfo parameters that can be retrieved from the VM properties?