kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.93k stars 4.65k forks source link

Hetzner instances are not joining the cluster #14308

Closed casperakos closed 2 years ago

casperakos commented 2 years ago

/kind bug

1. What kops version are you running? The command kops version, will display this information.

Client version: 1.25.0

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

v1.25.1

3. What cloud provider are you using?

Hetzner

4. What commands did you run? What is the simplest way to reproduce this issue?

kops create cluster --name=my-cluster.example.k8s.local \ --ssh-public-key=~/.ssh/id_rsa.pub --cloud=hetzner --zones=nbg1 \ --image=ubuntu-20.04 --networking=cilium --network-cidr=10.10.0.0/16

5. What happened after the commands executed?

Cluster has been created but 2 out five nodes are not joining the cluster. Cloud controller manager has this error : error running controllers: failed to parse cidr value:"" with error:invalid CIDR address: 6. What did you expect to happen?

All nodes join the cluster

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

kind: Cluster
metadata:
  creationTimestamp: "2022-09-20T17:05:09Z"
  name: my-cluster.example.k8s.local
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: hetzner
  configBase: s3://xxx/my-cluster.example.k8s.local
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-nbg1
      name: etcd-1
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-nbg1
      name: etcd-1
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.25.1
  masterPublicName: api.my-cluster.example.k8s.local
  networkCIDR: 10.10.0.0/16
  networking:
    cilium:
      enableNodePort: true
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - name: nbg1
    type: Public
    zone: nbg1
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2022-09-20T17:05:09Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: my-cluster.example.k8s.local
  name: master-nbg1
spec:
  image: ubuntu-20.04
  machineType: cx21
  maxSize: 3
  minSize: 3
  role: Master
  subnets:
  - nbg1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2022-09-20T17:05:09Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: my-cluster.example.k8s.local
  name: nodes-nbg1
spec:
  image: ubuntu-20.04
  machineType: cx21
  maxSize: 2
  minSize: 2
  role: Node
  subnets:
  - nbg1

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

hakman commented 2 years ago

Thanks for reporting this @casperakos. The issue was generated by a last minute update to Hetzner CCM that was missing some newly mandatory args. You should be able to fix things by manually changing to the older v1.12.1 CCM image or wait for the new kOps version to be released with the #14309 fix.

hakman commented 2 years ago

Workaround:

kubectl -n kube-system patch deployments.apps hcloud-cloud-controller-manager -p \
'{
   "spec" : {
      "template" : {
         "spec" : {
            "containers" : [
               {
                  "name" : "hcloud-cloud-controller-manager",
                  "command" : [
                    "/bin/hcloud-cloud-controller-manager",
                    "--allocate-node-cidrs=true",
                    "--allow-untagged-cloud=true",
                    "--cloud-provider=hcloud",
                    "--cluster-cidr=100.64.0.0/10",
                    "--configure-cloud-routes=false",
                    "--leader-elect=false",
                    "--v=2",
                    "--use-service-account-credentials=true"
                  ]
               }
            ]
         }
      }
   }
}'