vmware-tanzu / cluster-api-provider-bringyourownhost

Kubernetes Cluster API Provider BYOH for already-provisioned hosts running Linux.
Apache License 2.0
222 stars 72 forks source link

Cannot re-purpose/upgrade a ByoHost; agent log shows "UninstallationScript not found in Byohost" #855

Open nitendra-thakur opened 8 months ago

nitendra-thakur commented 8 months ago

What steps did you take and what happened: I had setup a worker cluster (version 1.25.11) as per getting_started. All cluster-api/byoh resources are created on the management cluster in a dedicated namespace devops-test. byoh agent on each node was started using this command:

nohup /opt/capi-byoh/byoh-hostagent-linux-amd64 --bootstrap-kubeconfig /opt/capi-byoh/bootstrap-kubeconfig.conf --namespace devops-test --label cluster=devops-test > /var/log/byoh-agent.log 2>&1 &

So far so good; everything works fine.

Today, I tried upgrading the worker cluster to 1.26.6; starting with KubeControlPlane first. Unfortunately the rollout is stuck and cannot proceed; byoh agent log is having this error:

E1102 01:19:27.661027 2676397 controller.go:317] controller/byohost "msg"="Reconciler error" "error"="UninstallationScript not found in Byohost node01" "name"="node01" "namespace"="devops-test" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="ByoHost"

byohost node01's events shows this:

Events:
  Type    Reason                   Age                 From                   Message
  ----    ------                   ----                ----                   -------
  Normal  ByoHostReleaseSucceeded  51m                 byomachine-controller  ByoHost Released by test-k8s-control-plane-nfws8
  Normal  ResetK8sNodeSucceeded    13m (x20 over 51m)  hostagent-controller   k8s Node Reset completed

And it still has the Machine ref set. That ByoMachine resource is already gone. But the machine ref still exist on the ByoHost.

Machine Ref:
    API Version:  infrastructure.cluster.x-k8s.io/v1beta1
    Kind:         ByoMachine
    Name:         test-k8s-control-plane-nfws8
    Namespace:    devops-test
    UID:          6383b3ff-5507-4a53-9a3d-3e9930b5c905

What did you expect to happen: All nodes should be upgraded to the new version and be in Ready state

Anything else you would like to add: I think this is generic issue that I face every timeI try to re-purpose a node. ByoHost continue having a reference to the ByoMachine and cannot be attached to new ByoMachines.

Environment:

nitendra-thakur commented 7 months ago

Issue was because UninstallationScript was missing byohost. I'm not sure why though; may be a past version of the provider component did not use that? Anyways, I was able to proceed with the upgrade by manually adding that field on all byohosts.