bug: machines fails to reconcile due to wrong name

mcbenjemaa commented 8 months ago

What steps did you take and what happened: [A clear and concise description of what the bug is.]

 failureMessage: expected VM name to match "capmox-e2e-p0jq9a-control-plane-b4t76"     but it was "capmox-e2e-buedhi-control-plane-cskrb"

What did you expect to happen:

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-proxmox version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

65278 commented 8 months ago

Recreation: deploy cluster, delete cluster, deploy cluster triggers this reliably (as this is about VMID reuse).

mcbenjemaa commented 8 months ago

I couldn't reproduced it the whole time

mcbenjemaa commented 8 months ago

The issue happens here, If the VM is not found, it will trigger the UpdateVMLocation. which will check the vmid in the other nodes. But since at the cloning time, Proxmox initializes the VM with its previous name. Therefore when a reconciliation happens in this case. it will throw this error state.

https://github.com/ionos-cloud/cluster-api-provider-proxmox/blob/a052fb1d6601017359701c0954cc8c9aead7d9ba/internal/service/vmservice/vm.go#L100-L112

I improved this before by checking if the ProxmoxMachine has a task, so skipping:

https://github.com/ionos-cloud/cluster-api-provider-proxmox/blob/a052fb1d6601017359701c0954cc8c9aead7d9ba/internal/service/vmservice/find.go#L62

The potential fix, can be to return just an error in this case, so the next reconciliation can succeed:

https://github.com/ionos-cloud/cluster-api-provider-proxmox/blob/a052fb1d6601017359701c0954cc8c9aead7d9ba/internal/service/vmservice/find.go#L93-L97

I don't know if it's okay, with this. CC @lubedacht

mcbenjemaa commented 8 months ago

I found a workaround, Make a new call to fetch the updated VM config cause the cluster resources Call is lazy.

https://pve.proxmox.com/pve-docs/api-viewer/#/nodes/{node}/qemu/{vmid}/config

ionos-cloud / cluster-api-provider-proxmox

bug: machines fails to reconcile due to wrong name #56