kubernetes-sigs / cluster-api-provider-azure

Cluster API implementation for Microsoft Azure
https://capz.sigs.k8s.io/
Apache License 2.0
293 stars 423 forks source link

MachinePool ready state leading to not processing providerIDs in CAPI #4982

Open mweibel opened 2 months ago

mweibel commented 2 months ago

/kind bug

What steps did you take and what happened: The following code determines ready state for a AzureMachinePool: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/90797931a191d5baf48bd6fa70c78f2207ad117f/azure/scope/machinepool.go#L571-L603

The following CAPI code is not run if AzureMachinePool is not ready: https://github.com/kubernetes-sigs/cluster-api/blob/8d639f1fad564eecf5bda0a2ee03c8a38896a184/exp/internal/controllers/machinepool_controller_phases.go#L290-L319

If I'm right, this logic together has the following effect:

This is a bug which can lead to issues with the known machines in a cluster. E.g. cluster-autoscaler with clusterapi provider doesn't know about certain machines.

I'm not sure whether the bug is in CAPZ or in CAPI:

What did you expect to happen: Scaling up/down works without issues and also a single VM doesn't impact the functioning of the full VMSS.

Anything else you would like to add: I guess this is initially more of a discussion point because there could be multiple facets of this issue.

Environment:

willie-yao commented 1 month ago

/priority backlog