syself / hetzner-cloud-controller-manager

Kubernetes cloud-controller-manager for Hetzner Cloud & Hetzner Robot. Enables the usage of Hetzner Dedicated Servers and Hetzner Cloud Servers
Apache License 2.0
10 stars 7 forks source link

Missing node.cluster.x-k8s.io/uninitialized taint in CCM Deployment #12

Closed prometherion closed 1 year ago

prometherion commented 1 year ago

I created a Kubernetes cluster with the Syself Cluster API infrastructure provider, with the Kubernetes version v1.25.2.

$: kubectl get providers -A
NAMESPACE                           NAME                     AGE    TYPE                     PROVIDER      VERSION
caph-system                         infrastructure-hetzner   3d2h   InfrastructureProvider   hetzner       v1.0.0-beta.17
capi-kubeadm-bootstrap-system       bootstrap-kubeadm        3d2h   BootstrapProvider        kubeadm       v1.4.4
capi-kubeadm-control-plane-system   control-plane-kubeadm    3d2h   ControlPlaneProvider     kubeadm       v1.4.4
capi-system                         cluster-api              3d2h   CoreProvider             cluster-api   v1.4.4
kamaji-system                       control-plane-kamaji     154m   ControlPlaneProvider     kamaji        v0.2.0

The resulting nodes of the seed cluster are unschedulable, as expected:

NAME                  STATUS     ROLES    AGE   VERSION
workload-md-0-6ql8k   NotReady   <none>   12m   v1.25.2
workload-md-0-dnx9c   NotReady   <none>   12m   v1.25.2

Each node has the following taints:

  taints:
  - effect: NoSchedule
    key: node.cluster.x-k8s.io/uninitialized
  - effect: NoSchedule
    key: node.cloudprovider.kubernetes.io/uninitialized
    value: "true"
  - effect: NoSchedule
    key: node.kubernetes.io/unreachable
    timeAdded: "2023-07-12T15:39:12Z

Once the CCM is installed using Helm

helm upgrade --install ccm syself/ccm-hetzner --version 1.1.4 \
--namespace kube-system \
--set privateNetwork.enabled=false

the resulting Deployment is non-schedulable on the worker nodes due to a non-satisfied toleration set.

      tolerations:
      - effect: NoSchedule
        key: node.cloudprovider.kubernetes.io/uninitialized
        value: "true"
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      - effect: NoSchedule
        key: node.kubernetes.io/not-ready

Once the taint is added, the CCM is successfully deployed.

@guettli

guettli commented 1 year ago

@prometherion do you have an idea, why this happens in your environment, but not in our environment?

With other words: Why does the taint node.cluster.x-k8s.io/uninitialized exist on your node?

guettli commented 1 year ago

I guess you are using a newer CAPI version.

There was a breaking change: https://github.com/kubernetes-sigs/cluster-api/pull/7993

Docs: https://cluster-api.sigs.k8s.io/developer/providers/bootstrap.html#taint-nodes-at-creation

guettli commented 1 year ago

Strange, I got this email, but I don't see a comment in Github. 

What about this PR: https://github.com/syself/hetzner-cloud-controller-manager/pull/13/files

Is this needed according to your POV?

Regards,   Thomas

Am Mittwoch, Juli 12, 2023 17:59 CEST, schrieb Dario Tranchitella @.***>:     I think this issue can be ignored, running the API Server and the Controller Manager with the CLI arguments --cloud-provider=external the issue is not presented. If you agree, we can close this bug report, since it's not a bug. —

prometherion commented 1 year ago

Ignore that, my bad, I've been tricked by the manual add of the toleration.

The mentioned PR is still valid.

guettli commented 1 year ago

@prometherion this CCM is mostly a fork of the hloud ccm. We try to follow the upstream hcloud ccm, if possible. If you get the PR into the hcloud ccm, then we will merge it into our code base automatically, since we do that from time to time.