kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.99k stars 4.65k forks source link

Hetzner's Rocky 8 Image Doesn't Include tar, Causes kops-configuration.service to Fail #16509

Open rehashedsalt opened 7 months ago

rehashedsalt commented 7 months ago

/kind bug

1. What kops version are you running? The command kops version, will display this information.

Client version: 1.28.4 (git-v1.28.4)

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

kubernetesVersion: 1.28.6

But it doesn't matter because the nodes never actually unpack k8s.

3. What cloud provider are you using?

Hetzner

4. What commands did you run? What is the simplest way to reproduce this issue?

kops create cluster \
    --name=example.k8s.local \
    --ssh-public-key=~/.ssh/id_rsa.pub \
    --cloud=hetzner \
    --zones=ash \
    --image=rocky-8 \
    --networking=calico \
    --network-cidr=10.10.0.0/16 \
    --node-size=cpx11 \
    --control-plane-size=cpx11
kops update cluster example.k8s.local --yes
kops export kubeconfig example.k8s.local --admin
kops validate cluster --wait 10m
# Observe as resources are created and then the cluster never comes up
# Then ssh into the control plane (or a node, I guess) and see issues
ssh root@control-plane
journalctl -u kops-configuration
which tar
# Confusion from here on

5. What happened after the commands executed?

Nodes were spun up, but on the control plane, we get this:

May 04 01:03:21 control-plane-ash-799518db3544ab1d nodeup[1610]: W0504 01:03:21.778261    1610 main.go:133] got error running nodeup (will retry in 30s): error adding asset "f3a841324845ca6bf0d4091b4fc7f97e18a623172158b72fc3fdcdb9d42d2d37@https://storage.googleapis.com/k8s-artifacts-cni/release/v1.2.0/cni-plugins-linux-amd64-v1.2.0.tgz": error expanding asset file "/var/cache/nodeup/sha256:f3a841324845ca6bf0d4091b4fc7f97e18a623172158b72fc3fdcdb9d42d2d37_cni-plugins-linux-amd64-v1_2_0_tgz" exec: "tar": executable file not found in $PATH:

And indeed:

[root@control-plane-ash-799518db3544ab1d ~]# which tar
/usr/bin/which: no tar in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)

6. What did you expect to happen?

The control plane to unpack the file and set itself up correctly.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "[REDACTED]"
  name: [REDACTED]
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: hetzner
  configBase: s3://[REDACTED]
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: control-plane-ash
      name: h
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: control-plane-ash
      name: h
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.28.6
  networkCIDR: 10.10.0.0/16
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - name: ash
    type: Public
    zone: ash
  topology:
    dns:
      type: None

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "[REDACTED]"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: control-plane-ash
spec:
  image: rocky-8
  machineType: cpx11
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - ash

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "[REDACTED]"
  labels:
    kops.k8s.io/cluster: [REDACTED]
  name: nodes-ash
spec:
  image: rocky-8
  machineType: cpx11
  maxSize: 3
  minSize: 3
  role: Node
  subnets:
  - ash

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

As this is a bug with cloud-init setup scripts (presumably), I've not included any output of a kops command here. The issue is dependencies not being installed correctly once the machines are given agency to set themselves up.

9. Anything else do we need to know?

I am so very, very confused as to why Hetzner's image doesn't include tar.

hakman commented 7 months ago

Thanks for reporting this @rehashedsalt. Could you try using the packages config option to install tar (not sure if the untar part runs first or not)? https://kops.sigs.k8s.io/instance_groups/#packages

rehashedsalt commented 7 months ago

No dice. additionalUserData with a cloud-init spec to install the package should work though since cloud-init installs kops-configuration.service as its last job.

hakman commented 7 months ago

Yes, additionalUserData will do it. I can't think of a better workaround for now. I will look into moving the logic to pure Go, instead of calling the tar executable.

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

hakman commented 4 months ago

/remove-lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 16 hours ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten