kubernetes-sigs / cluster-api-provider-ibmcloud

Cluster API Provider for IBM Cloud
https://cluster-api-ibmcloud.sigs.k8s.io
Apache License 2.0
62 stars 79 forks source link

Revisit MHC for remediating machines #1862

Open Amulyam24 opened 3 months ago

Amulyam24 commented 3 months ago

/kind bug /area provider/ibmcloud

What steps did you take and what happened: While testing the cloud-provider template to be used in PowerVS CI, it has been noticed upon using MachineHealthCheck remediation is failing and the machines are being deleted once they are up and running. Revisit them and add it accordingly.

I0628 12:19:56.515707       1 recorder.go:104] "Machine default/amulya-capi-test-mhc-md/amulya-capi-test-md-0-b5c58-l6rcg/amulya-capi-test-md-0-b5c58-l6rcg has been marked as unhealthy" logger="events" type="Normal" object={"kind":"Machine","namespace":"default","name":"amulya-capi-test-md-0-b5c58-l6rcg","uid":"10225910-5706-47e0-ad62-1184e8d60b6b","apiVersion":"cluster.x-k8s.io/v1beta1","resourceVersion":"369728"} reason="MachineMarkedUnhealthy"
I0628 12:19:56.530647       1 machinehealthcheck_controller.go:435] "Target has failed health check, marking for remediation" controller="machinehealthcheck" controllerGroup="cluster.x-k8s.io" controllerKind="MachineHealthCheck" MachineHealthCheck="default/amulya-capi-test-mhc-md" namespace="default" name="amulya-capi-test-mhc-md" reconcileID="09ba737c-beca-4564-be32-4df509188210" Cluster="default/amulya-capi-test" target="default/amulya-capi-test-mhc-md/amulya-capi-test-md-0-b5c58-l6rcg/amulya-capi-test-md-0-b5c58-l6rcg" reason="UnhealthyNode" message="Condition Ready on node is reporting status False for more than 1m0s"

What did you expect to happen: Machines to not be deleted even while passing the remediation check.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment: