syself / cluster-api-provider-hetzner

Cluster API Provider Hetzner :rocket: The best way to manage Kubernetes clusters on Hetzner, fully declarative, Kubernetes-native and with self-healing capabilities
https://caph.syself.com
Apache License 2.0
606 stars 57 forks source link

MachineDeployment stuck in Provisioned Phase (WaitingForAvailableMachines) #857

Closed localleon closed 6 months ago

localleon commented 1 year ago

/kind bug

What happened:

I'm trying to provision a Workload-Cluster with the Hetzner Infrastructure Provider from SysElf and RKE2 Controlplane. The same issue happens with the KubeAdm Provider.

After applying my YAML-Manifests, the machines get provisoned successfully and the cluster gets installed correctly. However, the "clusterctl describe" shows the Workers Stuck with the status "WaitingForAvailableMachines"

$ clusterctl describe cluster hetzner-capi-rke2-demo
NAME                                                                                    READY  SEVERITY  REASON                       SINCE  MESSAGE

Cluster/hetzner-capi-rke2-demo                                                          True                                          45m

├─ClusterInfrastructure - HetznerCluster/hetzner-capi-rke2-demo

├─ControlPlane - RKE2ControlPlane/hetzner-capi-rke2-demo-control-plane                  True                                          45m

│ └─Machine/hetzner-capi-rke2-demo-control-plane-7k6pj                                  True                                          45m

│   └─MachineInfrastructure - HCloudMachine/hetzner-capi-rke2-demo-control-plane-tk2mw

└─Workers

  └─MachineDeployment/hetzner-capi-rke2-demo-agent                                      False  Warning   WaitingForAvailableMachines  44m    Minimum availability requires 2 replicas, current 0 available
    └─2 Machines...                                                                     True                                          44m    See hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf-5hmjs, hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf-7gcjk

Machines and Machinesets, looks gode. Control-Plane is reachable

~/kube/hetzner-clusterapi-rke2 on  main! ⌚ 13:04:25
$ k get machine
NAME                                                  CLUSTER                  NODENAME   PROVIDERID          PHASE         AGE   VERSION
hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf-5hmjs   hetzner-capi-rke2-demo              hcloud://35102086   Provisioned   64m   v1.24.6
hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf-7gcjk   hetzner-capi-rke2-demo              hcloud://35102084   Provisioned   64m   v1.24.6
hetzner-capi-rke2-demo-control-plane-7k6pj            hetzner-capi-rke2-demo              hcloud://35102060   Provisioned   65m   v1.24.6

~/kube/hetzner-clusterapi-rke2 on  main! ⌚ 13:23:49
$ k get machineset
NAME                                            CLUSTER                  REPLICAS   READY   AVAILABLE   AGE   VERSION
hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf   hetzner-capi-rke2-demo   2                              64m   v1.24.6

~/kube/hetzner-clusterapi-rke2 on  main! ⌚ 13:23:51
$ k get rke2controlplanes.controlplane.cluster.x-k8s.io
NAME                                   AGE
hetzner-capi-rke2-demo-control-plane   65m

Accessing the KUBECONFIG via clusterctl works, and i can successfully access the cluster with kubectl

$ k get nodes
NAME                                         STATUS   ROLES                       AGE   VERSION
hetzner-capi-rke2-demo-control-plane-tk2mw   Ready    control-plane,etcd,master   63m   v1.24.6+rke2r1
hetzner-capi-rke2-demo-md-0-gf2zj            Ready    <none>                      62m   v1.24.6+rke2r1
hetzner-capi-rke2-demo-md-0-m69mx            Ready    <none>                      62m   v1.24.6+rke2r1

Nodes are shown ready on the RKE2-Cluster and no obvious errors in the logs. Something seem's to be wrong with the RKE2-Provider picking up the correct machines in the machineset

~/kube/hetzner-clusterapi-rke2 on  main! ⌚ 13:29:46
$ k get machinesets.cluster.x-k8s.io -o yaml
apiVersion: v1
items:
- apiVersion: cluster.x-k8s.io/v1beta1
  kind: MachineSet
  metadata:
    annotations:
      machinedeployment.clusters.x-k8s.io/desired-replicas: "2"
      machinedeployment.clusters.x-k8s.io/max-replicas: "3"
      machinedeployment.clusters.x-k8s.io/revision: "1"
    creationTimestamp: "2023-07-21T10:19:26Z"
    generation: 2
    labels:
      cluster.x-k8s.io/cluster-name: hetzner-capi-rke2-demo
      cluster.x-k8s.io/deployment-name: hetzner-capi-rke2-demo-agent
      machine-template-hash: 1260995260-rqn89
      nodepool: hetzner-capi-rke2-demo-agent
    name: hetzner-capi-rke2-demo-agent-56b4ff96b4xjhddf
    namespace: default
    ownerReferences:
    - apiVersion: cluster.x-k8s.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: MachineDeployment
      name: hetzner-capi-rke2-demo-agent
      uid: 4585a5d7-8f86-458f-a825-5d8e8e6cc6bb
    resourceVersion: "43465"
    uid: 2f207220-49f5-4b70-adf4-ec72d33e2c66
  spec:
    clusterName: hetzner-capi-rke2-demo
    deletePolicy: Random
    replicas: 2
    selector:
      matchLabels:
        cluster.x-k8s.io/cluster-name: hetzner-capi-rke2-demo
        cluster.x-k8s.io/deployment-name: hetzner-capi-rke2-demo-agent
        machine-template-hash: 1260995260-rqn89
        nodepool: hetzner-capi-rke2-demo-agent
    template:
      metadata:
        labels:
          cluster.x-k8s.io/cluster-name: hetzner-capi-rke2-demo
          cluster.x-k8s.io/deployment-name: hetzner-capi-rke2-demo-agent
          machine-template-hash: 1260995260-rqn89
          nodepool: hetzner-capi-rke2-demo-agent
      spec:
        bootstrap:
          configRef:
            apiVersion: bootstrap.cluster.x-k8s.io/v1alpha1
            kind: RKE2ConfigTemplate
            name: hetzner-capi-rke2-demo-agent
            namespace: default
        clusterName: hetzner-capi-rke2-demo
        failureDomain: nbg1
        infrastructureRef:
          apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
          kind: HCloudMachineTemplate
          name: hetzner-capi-rke2-demo-md-0
        version: v1.24.6
  status:
    conditions:
    - lastTransitionTime: "2023-07-21T10:19:26Z"
      message: Scaling up MachineSet to 2 replicas (actual 0)
      reason: ScalingUp
      severity: Warning
      status: "False"
      type: Ready
    - lastTransitionTime: "2023-07-21T10:19:26Z"
      status: "True"
      type: MachinesCreated
    - lastTransitionTime: "2023-07-21T10:19:49Z"
      status: "True"
      type: MachinesReady
    - lastTransitionTime: "2023-07-21T10:19:26Z"
      message: Scaling up MachineSet to 2 replicas (actual 0)
      reason: ScalingUp
      severity: Warning
      status: "False"
      type: Resized
    fullyLabeledReplicas: 2
    observedGeneration: 2
    replicas: 2
    selector: cluster.x-k8s.io/cluster-name=hetzner-capi-rke2-demo,cluster.x-k8s.io/deployment-name=hetzner-capi-rke2-demo-agent,machine-template-hash=1260995260-rqn89,nodepool=hetzner-capi-rke2-demo-agent
kind: List
metadata:
  resourceVersion: ""

What did you expect to happen:

After the RKE2-Cluster has successfully installed itself and is rechable. Status of the Kubernetes Resource should change to "Ready" from clusterctl

How to reproduce it:

Follow the quick-start-guide of this repo and generate a Hetzner-Cloud Kubeadm Cluster or apply this manifest https://github.com/localleon/hetzner-clusterapi-rke2/blob/main/hetzner-capi-rke2-demo.yaml

Install the Hetzner-CAPI-Infrastructure Provider and create a Cloud-Account. Follow the setup guide in https://github.com/syself/cluster-api-provider-hetzner/blob/main/docs/topics/preparation.md

Apply the following manifest https://github.com/localleon/hetzner-clusterapi-rke2/blob/main/hetzner-capi-rke2-demo.yaml to you're cluster.

Environment:

batistein commented 1 year ago

Just to be sure, do you have the cloud controller manager installed?

localleon commented 1 year ago

I don't think i installed it on this cluster. Is it required in the installation process?

batistein commented 1 year ago

Yes, the nodes cannot be initialized without the CCM. If a node is not initialized, the node state is not ready and therefore the cluster-api will not start the second control plane.

When a node is initialized by the ccm, the node object gets updated and the label node.cloudprovider.kubernetes.io/uninitialized=true is removed.

Here you can see that the kubelet sets this label when you specify the "external" cloud provider.: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cloud-provider/api/well_known_taints.go

localleon commented 1 year ago

Thanks for your input!

This actually resolved the issue i was having with my KubeAdm Installation. However, i'm still facing Issues with my RKE2 Setup. Do you have any idea here by chance? From my understanding of ClusterAPI. The issue still lies with the MachineDeployment and therefor the "infrastructure-provider"? This could be out of scope for this issue, so please let me know!

I rebuilt my setup from Scratch for RKE2 -> Till the "ScalingUp" State of the MachineDeployment and then deployed the Hetzner Cloud-Controller-Manager

kube-system       hcloud-cloud-controller-manager-74cd978f54-hqgv6                     1/1     Running     0          7m26s

Which leaves me with this state

$ clusterctl describe cluster hetzner-capi-rke2-demo
NAME                                                                                    READY  SEVERITY  REASON                       SINCE  MESSAGE

Cluster/hetzner-capi-rke2-demo                                                          True                                          31m

├─ClusterInfrastructure - HetznerCluster/hetzner-capi-rke2-demo

├─ControlPlane - RKE2ControlPlane/hetzner-capi-rke2-demo-control-plane                  True                                          31m

│ └─Machine/hetzner-capi-rke2-demo-control-plane-66l7h                                  True                                          31m

│   └─MachineInfrastructure - HCloudMachine/hetzner-capi-rke2-demo-control-plane-6klj8

└─Workers

  └─MachineDeployment/hetzner-capi-rke2-demo-agent                                      False  Warning   WaitingForAvailableMachines  32m    Minimum availability requires 2 replicas, current 0 available
    └─2 Machines...                                                                     True                                          31m    See hetzner-capi-rke2-demo-agent-56b4ff96b4xbwlcd-cjs9t, hetzner-capi-rke2-demo-agent-56b4ff96b4xbwlcd-txfqm

The Control-Plane doesn't have you're mentioned taint.

$ k get nodes hetzner-capi-rke2-demo-control-plane-6klj8 -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    etcd.rke2.cattle.io/node-address: 167.235.72.234
    etcd.rke2.cattle.io/node-name: hetzner-capi-rke2-demo-control-plane-6klj8-32b29d71
    node.alpha.kubernetes.io/ttl: "0"
    projectcalico.org/IPv4Address: 167.235.72.234/32
    projectcalico.org/IPv4VXLANTunnelAddr: 10.45.78.192
    rke2.io/encryption-config-hash: start-98fd297c8538e92b5f349bb28b8286d4988d9b5332917ba3daf0650471774fa3
    rke2.io/node-args: '["server","--cluster-cidr","10.45.0.0/16","--cni","calico","--disable-cloud-controller","true","--service-cidr","10.46.0.0/16","--tls-san","167.235.104.228","--token","********"]'
    rke2.io/node-config-hash: G2BV74OYYOAAJH6RH3PE5WQBAPB5GU5VLHDWJHGNK2K4UEURBBUQ====
    rke2.io/node-env: '{}'
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2023-07-27T17:19:12Z"
  finalizers:
  - wrangler.cattle.io/node
  - wrangler.cattle.io/managed-etcd-controller
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    egress.rke2.io/cluster: "true"
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: hetzner-capi-rke2-demo-control-plane-6klj8
    kubernetes.io/os: linux
    node-role.kubernetes.io/control-plane: "true"
    node-role.kubernetes.io/etcd: "true"
    node-role.kubernetes.io/master: "true"
  name: hetzner-capi-rke2-demo-control-plane-6klj8
  resourceVersion: "6553"
  uid: 6990e2fe-60b9-4a1e-af25-52bd25ed708f
spec:
  podCIDR: 10.45.0.0/24
  podCIDRs:
  - 10.45.0.0/24
status:
  addresses:
  - address: 167.235.72.234
    type: InternalIP
  - address: hetzner-capi-rke2-demo-control-plane-6klj8
    type: Hostname
  allocatable:
    cpu: "4"
    ephemeral-storage: "152921731772"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 7945716Ki
    pods: "110"
  capacity:
    cpu: "4"
    ephemeral-storage: 157197504Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 7945716Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2023-07-27T17:20:13Z"
    lastTransitionTime: "2023-07-27T17:20:13Z"
    message: Calico is running on this node
    reason: CalicoIsUp
    status: "False"
    type: NetworkUnavailable
  - lastHeartbeatTime: "2023-07-27T17:42:29Z"
    lastTransitionTime: "2023-07-27T17:19:12Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2023-07-27T17:42:29Z"
    lastTransitionTime: "2023-07-27T17:19:12Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2023-07-27T17:42:29Z"
    lastTransitionTime: "2023-07-27T17:19:12Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2023-07-27T17:42:29Z"
    lastTransitionTime: "2023-07-27T17:20:13Z"
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - docker.io/rancher/nginx-ingress-controller@sha256:4ca60e6a08b47ea52befdb3ff4c34aff8bb332ee228f8ed8066198cf1ca7eb77
    - docker.io/rancher/nginx-ingress-controller:nginx-1.2.1-hardened7
    sizeBytes: 228803432
  - names:
    - docker.io/rancher/hardened-kubernetes@sha256:aed133714db6f827570a0eaf377dddb9778d6f12d1cb3b72b2ba3b1690e579da
    - docker.io/rancher/hardened-kubernetes:v1.24.6-rke2r1-build20220921
    sizeBytes: 228143718
  - names:
    - docker.io/rancher/mirrored-calico-cni@sha256:ac7fc592df1db6a752b4594fa6f0fb2b67ce4977879c1577d37dbbe52c96240d
    - docker.io/rancher/mirrored-calico-cni:v3.24.1
    sizeBytes: 87382151
  - names:
    - docker.io/rancher/klipper-helm@sha256:6a8e819402e3fdd5ff9ec576174b6c0013870b9c0627a05fa0ab17374b5cf189
    - docker.io/rancher/klipper-helm:v0.7.3-build20220613
    sizeBytes: 82983731
  - names:
    - docker.io/rancher/mirrored-calico-node@sha256:d30a70114c8df718b957db1ffd07fbc53ba44e860ce2be19fcac95accd8026a8
    - docker.io/rancher/mirrored-calico-node:v3.24.1
    sizeBytes: 80180549
  - names:
    - docker.io/rancher/hardened-etcd@sha256:146775f007a4b322485a7f1705424c2c799a4fccc1451484edecbdba0d07847f
    - docker.io/rancher/hardened-etcd:v3.5.4-k3s1-build20220504
    sizeBytes: 49072950
  - names:
    - docker.io/rancher/mirrored-calico-pod2daemon-flexvol@sha256:340032e77211a80b9219a811fa7e325928b22ab1c00dead4a99d57549fab2f2f
    - docker.io/rancher/mirrored-calico-pod2daemon-flexvol:v3.24.1
    sizeBytes: 7059436
  - names:
    - docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
    - docker.io/rancher/pause:3.6
    sizeBytes: 299396
  nodeInfo:
    architecture: amd64
    bootID: 8330b561-c57c-4377-8a73-27c222eecd7d
    containerRuntimeVersion: containerd://1.6.8-k3s1
    kernelVersion: 5.15.0-73-generic
    kubeProxyVersion: v1.24.6+rke2r1
    kubeletVersion: v1.24.6+rke2r1
    machineID: f0b9b008e4c840b0aae50d62c23ed6d8
    operatingSystem: linux
    osImage: Ubuntu 22.04.2 LTS
    systemUUID: f0b9b008-e4c8-40b0-aae5-0d62c23ed6d8

One of the unready nodes looks like this

$ k get nodes hetzner-capi-rke2-demo-md-0-chxv2 -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    node.alpha.kubernetes.io/ttl: "0"
    projectcalico.org/IPv4Address: 5.75.133.151/32
    projectcalico.org/IPv4VXLANTunnelAddr: 10.45.19.192
    rke2.io/node-args: '["agent","--server","https://167.235.72.234:9345","--token","********"]'
    rke2.io/node-config-hash: XEGEY345ZC3AFQU5K6V73Z6OPLAWOVYQICXU36KT3PQDZW7LLODQ====
    rke2.io/node-env: '{}'
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2023-07-27T17:19:29Z"
  finalizers:
  - wrangler.cattle.io/node
  - wrangler.cattle.io/managed-etcd-controller
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    egress.rke2.io/cluster: "true"
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: hetzner-capi-rke2-demo-md-0-chxv2
    kubernetes.io/os: linux
  name: hetzner-capi-rke2-demo-md-0-chxv2
  resourceVersion: "7614"
  uid: 5d78e616-c5ea-4d99-9289-9baa1aa766c1
spec:
  podCIDR: 10.45.1.0/24
  podCIDRs:
  - 10.45.1.0/24
status:
  addresses:
  - address: 5.75.133.151
    type: InternalIP
  - address: hetzner-capi-rke2-demo-md-0-chxv2
    type: Hostname
  allocatable:
    cpu: "4"
    ephemeral-storage: "152921731772"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 7945716Ki
    pods: "110"
  capacity:
    cpu: "4"
    ephemeral-storage: 157197504Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 7945716Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2023-07-27T17:20:14Z"
    lastTransitionTime: "2023-07-27T17:20:14Z"
    message: Calico is running on this node
    reason: CalicoIsUp
    status: "False"
    type: NetworkUnavailable
  - lastHeartbeatTime: "2023-07-27T17:47:24Z"
    lastTransitionTime: "2023-07-27T17:19:28Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2023-07-27T17:47:24Z"
    lastTransitionTime: "2023-07-27T17:19:28Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2023-07-27T17:47:24Z"
    lastTransitionTime: "2023-07-27T17:19:28Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2023-07-27T17:47:24Z"
    lastTransitionTime: "2023-07-27T17:20:09Z"
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - docker.io/rancher/nginx-ingress-controller@sha256:4ca60e6a08b47ea52befdb3ff4c34aff8bb332ee228f8ed8066198cf1ca7eb77
    - docker.io/rancher/nginx-ingress-controller:nginx-1.2.1-hardened7
    sizeBytes: 228803432
  - names:
    - docker.io/rancher/hardened-kubernetes@sha256:aed133714db6f827570a0eaf377dddb9778d6f12d1cb3b72b2ba3b1690e579da
    - docker.io/rancher/hardened-kubernetes:v1.24.6-rke2r1-build20220921
    sizeBytes: 228143718
  - names:
    - docker.io/rancher/mirrored-calico-cni@sha256:ac7fc592df1db6a752b4594fa6f0fb2b67ce4977879c1577d37dbbe52c96240d
    - docker.io/rancher/mirrored-calico-cni:v3.24.1
    sizeBytes: 87382151
  - names:
    - docker.io/rancher/klipper-helm@sha256:6a8e819402e3fdd5ff9ec576174b6c0013870b9c0627a05fa0ab17374b5cf189
    - docker.io/rancher/klipper-helm:v0.7.3-build20220613
    sizeBytes: 82983731
  - names:
    - docker.io/rancher/mirrored-calico-node@sha256:d30a70114c8df718b957db1ffd07fbc53ba44e860ce2be19fcac95accd8026a8
    - docker.io/rancher/mirrored-calico-node:v3.24.1
    sizeBytes: 80180549
  - names:
    - docker.io/rancher/hardened-coredns@sha256:0ff19d19385acdfad644792531ac4c108c73cf021c755b08b7c6dfbef053a043
    - docker.io/rancher/hardened-coredns:v1.9.3-build20220613
    sizeBytes: 48334303
  - names:
    - docker.io/rancher/hardened-cluster-autoscaler@sha256:7cc3ec1030240a8b69d1185611c1f89cf357cddae642e8cc082e1a49ebc3611d
    - docker.io/rancher/hardened-cluster-autoscaler:v1.8.5-build20211119
    sizeBytes: 43568033
  - names:
    - docker.io/rancher/mirrored-calico-kube-controllers@sha256:045647bf84a4e9d4849f8a1d11152b9e16db4127441e8777a77a9b19d6e88759
    - docker.io/rancher/mirrored-calico-kube-controllers:v3.24.1
    sizeBytes: 31125617
  - names:
    - docker.io/rancher/mirrored-calico-typha@sha256:046208344794a6b653bb575a78daab1723f8b86e069e39f42643b17f4b36b72e
    - docker.io/rancher/mirrored-calico-typha:v3.24.1
    sizeBytes: 28357374
  - names:
    - docker.io/rancher/mirrored-calico-operator@sha256:8ab75a778d2add33f6dc93c9d7c229df4d9c66c42444976b68c78474948a1365
    - docker.io/rancher/mirrored-calico-operator:v1.28.1
    sizeBytes: 18841480
  - names:
    - docker.io/rancher/mirrored-calico-pod2daemon-flexvol@sha256:340032e77211a80b9219a811fa7e325928b22ab1c00dead4a99d57549fab2f2f
    - docker.io/rancher/mirrored-calico-pod2daemon-flexvol:v3.24.1
    sizeBytes: 7059436
  - names:
    - docker.io/rancher/pause@sha256:036d575e82945c112ef84e4585caff3648322a2f9ed4c3a6ce409dd10abc4f34
    - docker.io/rancher/pause:3.6
    sizeBytes: 299396
  nodeInfo:
    architecture: amd64
    bootID: f2ccf96d-b04d-4de7-9571-48c020a500f2
    containerRuntimeVersion: containerd://1.6.8-k3s1
    kernelVersion: 5.15.0-73-generic
    kubeProxyVersion: v1.24.6+rke2r1
    kubeletVersion: v1.24.6+rke2r1
    machineID: 7d0a5fd4b1b24e63967adbce1f1426b2
    operatingSystem: linux
    osImage: Ubuntu 22.04.2 LTS
    systemUUID: 7d0a5fd4-b1b2-4e63-967a-dbce1f1426b2

So this doesn't seem to solve the issue, at least for the RKE2-Provider?

{"level":"INFO","time":"2023-07-27T17:16:10.126Z","file":"controller/controller.go:194","message":"Starting Controller","controller":"hetznerbaremetalmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerBareMetalMachine"}
{"level":"INFO","time":"2023-07-27T17:16:10.226Z","file":"controller/controller.go:228","message":"Starting workers","controller":"hcloudmachinetemplate","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachineTemplate","worker count":1}
{"level":"INFO","time":"2023-07-27T17:16:10.227Z","file":"controllers/hcloudmachinetemplate_controller.go:66","message":"HCloudMachineTemplate is missing cluster label or cluster does not exist default/hetzner-capi-rke2-demo-template-control-plane","controller":"hcloudmachinetemplate","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachineTemplate","HCloudMachineTemplate":{"name":"hetzner-capi-rke2-demo-template-control-plane","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-template-control-plane","reconcileID":"340273e0-3b56-43c1-b073-71d9dbf64c38","HCloudMachineTemplate":{"name":"hetzner-capi-rke2-demo-template-control-plane","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:16:10.244Z","file":"controller/controller.go:228","message":"Starting workers","controller":"hetznercluster","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerCluster","worker count":1}
{"level":"INFO","time":"2023-07-27T17:16:10.244Z","file":"controller/controller.go:228","message":"Starting workers","controller":"hetznerbaremetalhost","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerBareMetalHost","worker count":1}
{"level":"INFO","time":"2023-07-27T17:16:10.244Z","file":"controller/controller.go:228","message":"Starting workers","controller":"hcloudremediation","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudRemediation","worker count":1}
{"level":"INFO","time":"2023-07-27T17:16:10.244Z","file":"controller/controller.go:228","message":"Starting workers","controller":"hetznerbaremetalremediation","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerBareMetalRemediation","worker count":1}
{"level":"INFO","time":"2023-07-27T17:16:10.244Z","file":"controller/controller.go:228","message":"Starting workers","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","worker count":1}
{"level":"INFO","time":"2023-07-27T17:16:10.244Z","file":"controller/controller.go:228","message":"Starting workers","controller":"hetznerbaremetalmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerBareMetalMachine","worker count":1}
{"level":"INFO","time":"2023-07-27T17:18:05.289Z","file":"controllers/hetznercluster_controller.go:110","message":"Cluster Controller has not yet set OwnerRef","controller":"hetznercluster","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerCluster","HetznerCluster":{"name":"hetzner-capi-rke2-demo","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo","reconcileID":"a67c21b0-0c44-4119-8756-b5a4f93a2e4e","HetznerCluster":{"name":"hetzner-capi-rke2-demo","namespace":"default"},"Cluster":{"name":""}}
{"level":"INFO","time":"2023-07-27T17:18:05.303Z","file":"controllers/hetznercluster_controller.go:110","message":"Cluster Controller has not yet set OwnerRef","controller":"hetznercluster","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerCluster","HetznerCluster":{"name":"hetzner-capi-rke2-demo","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo","reconcileID":"f6ccfa00-4a16-44fb-8b65-b82d98e9dc62","HetznerCluster":{"name":"hetzner-capi-rke2-demo","namespace":"default"},"Cluster":{"name":""}}
{"level":"INFO","time":"2023-07-27T17:18:05.313Z","file":"controllers/hetznercluster_controller.go:110","message":"Cluster Controller has not yet set OwnerRef","controller":"hetznercluster","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerCluster","HetznerCluster":{"name":"hetzner-capi-rke2-demo","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo","reconcileID":"f9a8bd0d-70ab-4123-a96f-737f1a5635da","HetznerCluster":{"name":"hetzner-capi-rke2-demo","namespace":"default"},"Cluster":{"name":""}}
{"level":"INFO","time":"2023-07-27T17:18:05.319Z","file":"controllers/hetznercluster_controller.go:110","message":"Cluster Controller has not yet set OwnerRef","controller":"hetznercluster","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HetznerCluster","HetznerCluster":{"name":"hetzner-capi-rke2-demo","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo","reconcileID":"a1274771-b58b-4e3e-bdfa-2cfb06a6e877","HetznerCluster":{"name":"hetzner-capi-rke2-demo","namespace":"default"},"Cluster":{"name":""}}
{"level":"INFO","time":"2023-07-27T17:18:05.358Z","file":"controllers/hcloudmachinetemplate_controller.go:66","message":"HCloudMachineTemplate is missing cluster label or cluster does not exist default/hetzner-capi-rke2-demo-control-plane","controller":"hcloudmachinetemplate","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachineTemplate","HCloudMachineTemplate":{"name":"hetzner-capi-rke2-demo-control-plane","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-control-plane","reconcileID":"b64c31d6-ced8-4bc0-a5a9-471880ec8d71","HCloudMachineTemplate":{"name":"hetzner-capi-rke2-demo-control-plane","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.366Z","file":"controllers/hcloudmachinetemplate_controller.go:66","message":"HCloudMachineTemplate is missing cluster label or cluster does not exist default/hetzner-capi-rke2-demo-md-0","controller":"hcloudmachinetemplate","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachineTemplate","HCloudMachineTemplate":{"name":"hetzner-capi-rke2-demo-md-0","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0","reconcileID":"eac05dda-29e9-41b8-901d-90746ba35d89","HCloudMachineTemplate":{"name":"hetzner-capi-rke2-demo-md-0","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.384Z","file":"controllers/hcloudmachinetemplate_controller.go:66","message":"HCloudMachineTemplate is missing cluster label or cluster does not exist default/hetzner-capi-rke2-demo-md-0","controller":"hcloudmachinetemplate","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachineTemplate","HCloudMachineTemplate":{"name":"hetzner-capi-rke2-demo-md-0","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0","reconcileID":"d07b12e4-7d96-4b50-8c2f-855677628814","HCloudMachineTemplate":{"name":"hetzner-capi-rke2-demo-md-0","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.439Z","file":"controllers/hcloudmachine_controller.go:83","message":"Machine Controller has not yet set OwnerRef","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0-ftlbg","reconcileID":"9ac45254-f24d-4adc-80de-7b9d29bb3914","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.458Z","file":"controllers/hcloudmachine_controller.go:83","message":"Machine Controller has not yet set OwnerRef","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0-ftlbg","reconcileID":"fb4f9497-ef5a-4bc8-bec5-6f201d729ac1","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.467Z","file":"controllers/hcloudmachine_controller.go:83","message":"Machine Controller has not yet set OwnerRef","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0-ftlbg","reconcileID":"75ab8290-12fa-4ad3-893d-c98662a09acd","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.472Z","file":"controllers/hcloudmachine_controller.go:83","message":"Machine Controller has not yet set OwnerRef","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0-ftlbg","reconcileID":"ec5d6090-51e0-4d3c-b524-c6ed6d5752c4","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.475Z","file":"controllers/hcloudmachine_controller.go:83","message":"Machine Controller has not yet set OwnerRef","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-chxv2","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0-chxv2","reconcileID":"2e3ce63e-be49-40e3-af47-d2dac3ab8441","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-chxv2","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.545Z","file":"controllers/hcloudmachine_controller.go:83","message":"Machine Controller has not yet set OwnerRef","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-chxv2","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0-chxv2","reconcileID":"c309b414-0ef7-4e19-8dd7-09ff35614fc4","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-chxv2","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.545Z","file":"controllers/hcloudmachine_controller.go:83","message":"Machine Controller has not yet set OwnerRef","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-chxv2","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0-chxv2","reconcileID":"92d7f601-7db1-45bc-956a-ed164539b134","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-chxv2","namespace":"default"}}
2023/07/27 17:18:05 http: TLS handshake error from 10.244.0.1:59201: EOF
{"level":"INFO","time":"2023-07-27T17:18:05.549Z","file":"controllers/hcloudmachine_controller.go:83","message":"Machine Controller has not yet set OwnerRef","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0-ftlbg","reconcileID":"df9d250a-402e-4268-922a-674785635e47","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-ftlbg","namespace":"default"}}
{"level":"INFO","time":"2023-07-27T17:18:05.549Z","file":"controllers/hcloudmachine_controller.go:83","message":"Machine Controller has not yet set OwnerRef","controller":"hcloudmachine","controllerGroup":"infrastructure.cluster.x-k8s.io","controllerKind":"HCloudMachine","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-chxv2","namespace":"default"},"namespace":"default","name":"hetzner-capi-rke2-demo-md-0-chxv2","reconcileID":"ce6bc462-a8bf-4389-9437-3ca989fd8526","HCloudMachine":{"name":"hetzner-capi-rke2-demo-md-0-chxv2","namespace":"default"}}
2023/07/27 17:18:05 http: TLS handshake error from 10.244.0.1:11812: EOF
2023/07/27 17:18:05 http: TLS handshake error from 10.244.0.1:59281: EOF

I'm also seeing some TLS Error Messages in the Log, which i can not explain

guettli commented 6 months ago

@localleon we improved the usability by creating more kubernetes events. And we do more pre-flight-checks to detect broken configuration. Current version is beta32. I close this issue. Please open a new issue if you still have issues. I hope that is ok for you.