rancher / cluster-api-provider-rke2

RKE2 bootstrap and control-plane Cluster API providers.
https://rancher.github.io/cluster-api-provider-rke2/
Apache License 2.0
84 stars 30 forks source link

Cluster Deployment stuck in WaitingForAvailableMachines #158

Closed localleon closed 10 months ago

localleon commented 1 year ago

What happened:

I'm trying to provision a Workload-Cluster with the Hetzner Infrastructure Provider from SysElf and RKE2 for Bootstrap and Controlplane.

After applying my YAML-Manifests, the machines get provisoned successfully and the cluster gets installed correctly. However, the "clusterctl describe" shows the Workers Stuck with the status "WaitingForAvailableMachines"

$ clusterctl describe cluster hetzner-capi-rke2-demo
NAME                                                                                    READY  SEVERITY  REASON                       SINCE  MESSAGE

Cluster/hetzner-capi-rke2-demo                                                          True                                          45m

├─ClusterInfrastructure - HetznerCluster/hetzner-capi-rke2-demo

├─ControlPlane - RKE2ControlPlane/hetzner-capi-rke2-demo-control-plane                  True                                          45m

│ └─Machine/hetzner-capi-rke2-demo-control-plane-7k6pj                                  True                                          45m

│   └─MachineInfrastructure - HCloudMachine/hetzner-capi-rke2-demo-control-plane-tk2mw

└─Workers

  └─MachineDeployment/hetzner-capi-rke2-demo-agent                                      False  Warning   WaitingForAvailableMachines  44m    Minimum availability requires 2 replicas, current 0 available
    └─2 Machines...                                                                     True                                          44m    See hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf-5hmjs, hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf-7gcjk

Machines and Machinesets, looks gode. Control-Plane is reachable

~/kube/hetzner-clusterapi-rke2 on  main! ⌚ 13:04:25
$ k get machine
NAME                                                  CLUSTER                  NODENAME   PROVIDERID          PHASE         AGE   VERSION
hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf-5hmjs   hetzner-capi-rke2-demo              hcloud://35102086   Provisioned   64m   v1.24.6
hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf-7gcjk   hetzner-capi-rke2-demo              hcloud://35102084   Provisioned   64m   v1.24.6
hetzner-capi-rke2-demo-control-plane-7k6pj            hetzner-capi-rke2-demo              hcloud://35102060   Provisioned   65m   v1.24.6

~/kube/hetzner-clusterapi-rke2 on  main! ⌚ 13:23:49
$ k get machineset
NAME                                            CLUSTER                  REPLICAS   READY   AVAILABLE   AGE   VERSION
hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf   hetzner-capi-rke2-demo   2                              64m   v1.24.6

~/kube/hetzner-clusterapi-rke2 on  main! ⌚ 13:23:51
$ k get rke2controlplanes.controlplane.cluster.x-k8s.io
NAME                                   AGE
hetzner-capi-rke2-demo-control-plane   65m

Accessing the KUBECONFIG via clusterctl works, and i can successfully access the cluster with kubectl

$ k get nodes
NAME                                         STATUS   ROLES                       AGE   VERSION
hetzner-capi-rke2-demo-control-plane-tk2mw   Ready    control-plane,etcd,master   63m   v1.24.6+rke2r1
hetzner-capi-rke2-demo-md-0-gf2zj            Ready    <none>                      62m   v1.24.6+rke2r1
hetzner-capi-rke2-demo-md-0-m69mx            Ready    <none>                      62m   v1.24.6+rke2r1

Nodes are shown ready on the RKE2-Cluster and no obvious errors in the logs. Something seem's to be wrong with the RKE2-Provider picking up the correct machines in the machineset

~/kube/hetzner-clusterapi-rke2 on  main! ⌚ 13:29:46
$ k get machinesets.cluster.x-k8s.io -o yaml
apiVersion: v1
items:
- apiVersion: cluster.x-k8s.io/v1beta1
  kind: MachineSet
  metadata:
    annotations:
      machinedeployment.clusters.x-k8s.io/desired-replicas: "2"
      machinedeployment.clusters.x-k8s.io/max-replicas: "3"
      machinedeployment.clusters.x-k8s.io/revision: "1"
    creationTimestamp: "2023-07-21T10:19:26Z"
    generation: 2
    labels:
      cluster.x-k8s.io/cluster-name: hetzner-capi-rke2-demo
      cluster.x-k8s.io/deployment-name: hetzner-capi-rke2-demo-agent
      machine-template-hash: 1260995260-rqn89
      nodepool: hetzner-capi-rke2-demo-agent
    name: hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf
    namespace: default
    ownerReferences:
    - apiVersion: cluster.x-k8s.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: MachineDeployment
      name: hetzner-capi-rke2-demo-agent
      uid: 4585a5d7-8f86-458f-a825-5d8e8e6cc6bb
    resourceVersion: "43465"
    uid: 2f207220-49f5-4b70-adf4-ec72d33e2c66
  spec:
    clusterName: hetzner-capi-rke2-demo
    deletePolicy: Random
    replicas: 2
    selector:
      matchLabels:
        cluster.x-k8s.io/cluster-name: hetzner-capi-rke2-demo
        cluster.x-k8s.io/deployment-name: hetzner-capi-rke2-demo-agent
        machine-template-hash: 1260995260-rqn89
        nodepool: hetzner-capi-rke2-demo-agent
    template:
      metadata:
        labels:
          cluster.x-k8s.io/cluster-name: hetzner-capi-rke2-demo
          cluster.x-k8s.io/deployment-name: hetzner-capi-rke2-demo-agent
          machine-template-hash: 1260995260-rqn89
          nodepool: hetzner-capi-rke2-demo-agent
      spec:
        bootstrap:
          configRef:
            apiVersion: bootstrap.cluster.x-k8s.io/v1alpha1
            kind: RKE2ConfigTemplate
            name: hetzner-capi-rke2-demo-agent
            namespace: default
        clusterName: hetzner-capi-rke2-demo
        failureDomain: nbg1
        infrastructureRef:
          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
          kind: HCloudMachineTemplate
          name: hetzner-capi-rke2-demo-md-0
        version: v1.24.6
  status:
    conditions:
    - lastTransitionTime: "2023-07-21T10:19:26Z"
      message: Scaling up MachineSet to 2 replicas (actual 0)
      reason: ScalingUp
      severity: Warning
      status: "False"
      type: Ready
    - lastTransitionTime: "2023-07-21T10:19:26Z"
      status: "True"
      type: MachinesCreated
    - lastTransitionTime: "2023-07-21T10:19:49Z"
      status: "True"
      type: MachinesReady
    - lastTransitionTime: "2023-07-21T10:19:26Z"
      message: Scaling up MachineSet to 2 replicas (actual 0)
      reason: ScalingUp
      severity: Warning
      status: "False"
      type: Resized
    fullyLabeledReplicas: 2
    observedGeneration: 2
    replicas: 2
    selector: cluster.x-k8s.io/cluster-name=hetzner-capi-rke2-demo,cluster.x-k8s.io/deployment-name=hetzner-capi-rke2-demo-agent,machine-template-hash=1260995260-rqn89,nodepool=hetzner-capi-rke2-demo-agent
kind: List
metadata:
  resourceVersion: ""

What did you expect to happen:

After the RKE2-Cluster has successfully installed itself and is rechable. Status of the Kubernetes Resource should change to "Ready" from clusterctl

How to reproduce it:

Install the Hetzner-CAPI-Infrastructure Provider and create a Cloud-Account. Follow the setup guide in https://github.com/syself/cluster-api-provider-hetzner/blob/main/docs/topics/preparation.md

Apply the following manifest https://github.com/localleon/hetzner-clusterapi-rke2/blob/main/hetzner-capi-rke2-demo.yaml to you're cluster.

Environment:

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity.

localleon commented 10 months ago

Since this issue is not getting any response from the maintainers, im gonna go ahead and close this. If i try some newer version of the provisioner, i may create a new issue or reopen this.