rancher-sandbox / cluster-api-provider-harvester

A Cluster API Infrastructure Provider for Harvester
Apache License 2.0
21 stars 6 forks source link

Bug: Waiting for ProviderID to be set #38

Open PatrickLaabs opened 3 months ago

PatrickLaabs commented 3 months ago

What happened: When provisioning a Cluster - following the Guide - to Harvester, the nodes (1x ControlPlane, 2x Worker) are getting created and the LoadBalancer Instance, too.

The CAPHV-Controller keeps telling me:

2024-06-07T08:34:33Z    INFO    Reconciling HarvesterMachine ...    {"controller": "harvestermachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "Har
vesterMachine", "HarvesterMachine": {"name":"test-rk-workers-jhfpk-hcsbw","namespace":"example-rk"}, "namespace": "example-rk", "name": "test-rk-workers-jhfpk-hcsbw", "reconcileID": "722c
0c53-b6c4-4ce2-9d9e-928a83056ddd"}

The ControlPlane has become ready, but the workers not. The ControlPlane does have a ProviderID - when viewing the harvestercluster Ressource.

What did you expect to happen:

I'd did expect a ProviderID being set to the workers, so that my Cluster becomes ready.

How to reproduce it:

Anything else you would like to add:

I also tried to restart the controller pods on my local kind-cluster, but that didn't helped.

I wonder, if this still comes from the Tigera-Operator deployment. Even if the pod

tigera-operator     tigera-operator-7975bb4546-jqm6f                          ●      1/1       Running

is running and healthy, the pod seems to throws some "errors":

{"level":"info","ts":1717749713.219541,"logger":"controller_apiserver","msg":"Reconciling APIServer","Request.Namespace":"","Request.Name":"default"}

{"level":"info","ts":1717749713.2196012,"logger":"controller_apiserver","msg":"APIServer config not found","Request.Namespace":"","Request.Name":"default"}

Environment:

PatrickLaabs commented 3 months ago

Ok, the Problem was the Version v0.3.0 of the RKE2 Bootstrap ControlPlane Providers.

Running a fresh bootstrap process with this:

clusterctl init --infrastructure harvester:v0.1.2 --control-plane rke2:v0.2.2 --bootstrap rke2:v0.2.2

Sets the ProviderID on the harvestermachines.

ekarlso commented 2 months ago

Is your Cloud Provider pod coming up or health at all ? Provider ID needs to be set on the Node(s) for CAPHV I think to get it.

PatrickLaabs commented 2 months ago

Hey, I guess it was up and running and also healthy. But I do a double-check later.

PatrickLaabs commented 2 months ago

I can definitely tell, that the harvester-cloud-provider may not be the cause of the issue, because it worked as expected as soon as i downgraded from v0.3.0 to v0.2.7 (rke2 controlplane and bootstrap)

PatrickLaabs commented 2 months ago

https://github.com/rancher-sandbox/cluster-api-provider-rke2/issues/345#issuecomment-2160281114

Havn't tried this one out yet. Does someone have a harvester cluster at hand for this testing? I am currently doing some heavy testing with mine - and don't want to re-do all the deployment work again 😄

belgaied2 commented 2 months ago

@PatrickLaabs I can provide you with some terraform stuff to get a running Harvester cluster on Equinix!