kubernetes-sigs / cluster-api-provider-azure

Cluster API implementation for Microsoft Azure
https://capz.sigs.k8s.io/
Apache License 2.0
292 stars 419 forks source link

VM provisioning keeps failing with CAPI api (pointing to CAPZ.Windows.Bootstrapping extension failed) #5038

Open Apoorva2405 opened 1 month ago

Apoorva2405 commented 1 month ago

/kind bug

What steps did you take and what happened: VM provisioning keeps failing with CAPI api.

We have Azure Managed Kubernetes cluster and on top of it, we run cluster API (CAPI) to create workload cluster. We see the VM provisioning fails intermittently but quite often on Azure with error.

"properties": { "statusMessage": "{\"status\":\"Failed\",\"error\":{\"code\":\"ResourceOperationFailure\",\"message\":\"The resource operation completed with terminal provisioning state 'Failed'.\",\"details\":[{\"code\":\"OSProvisioningClientError\",\"message\":\"OS Provisioning for VM 'cxp-workl-xqtkv' did not finish in the allotted time. However, the VM guest agent was detected running. This suggests the guest OS has not been properly prepared to be used as a VM image (with CreateOption=FromImage). To resolve this issue, either use the VHD as is with CreateOption=Attach or prepare it properly for use as an image:\r\n Instructions for Windows: https://azure.microsoft.com/documentation/articles/virtual-machines-windows-upload-image/ \r\n Instructions for Linux: https://azure.microsoft.com/documentation/articles/virtual-machines-linux-capture-image/ \"}]}}", "eventCategory": "Administrative", "entity": "/subscriptions/xxe/resourcegroups/cxp-rg-dev-eastus2-workload/providers/Microsoft.Compute/virtualMachines/cxp-workl-xqtkv/extensions/CAPZ.Windows.Bootstrapping", "message": "Microsoft.Compute/virtualMachines/extensions/write", "hierarchy": "xx/DMe/DMe-NONPRD/xx" }, "relatedEvents": [] }

1) We logged a service ticket with Azure and as per the error it seems to be happening due to Capz.Windows.Bootstrapping extension failure. 2) Checked Boot Diagnostics logs on Azure portal as well as serial log. However, the issue is VM gets deleted as soon as the provisioning fails hence logs do not have much info

What did you expect to happen: VM provisioning should be successful

Environment: All Environment including Dev, Stage and Prod.

jsturtevant commented 1 month ago

the bootstrapping extension looks like a red hearing. You could test it by starting a VM outside CAPZ manually:

az image create -n testvmimage -g cluster-api-images --os-type <Windows/Linux> --source <storage url for vhd file>
az vm create -n testvm --image testvmimage -g cluster-api-images

It looks the like isssue is This suggests the guest OS has not been properly prepared to be used as a VM image (with CreateOption=FromImage). To resolve this issue, either use the VHD as is with CreateOption=Attach or prepare it properly for use as an image:\r\n * Instructions for Windows: https://azure.microsoft.com/documentation/articles/virtual-machines-windows-upload-image. How was the image prepared? Did you sysprep it? Did you follow those instructions?