kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.67k stars 4.61k forks source link

ig nodeLabels not passed to kubernetes nodes in Hetzner #16159

Open lukasredev opened 7 months ago

lukasredev commented 7 months ago

/kind bug

1. What kops version are you running? The command kops version, will display this information. v1.28.1

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. v1.27.8

3. What cloud provider are you using? Hetzner

4. What commands did you run? What is the simplest way to reproduce this issue? Create the cluster

kops create cluster --name=my-cluster.lukasre.k8s.local \
  --ssh-public-key=path-to-pub --cloud=hetzner --zones=fsn1 \
  --image=ubuntu-20.04 --networking=calico --network-cidr=10.10.0.0/16

Add a new instance group with different node labels

kops create ig nodes-immich-fsn1 --subnet fsn1

Edit the instance group with the following config:

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-12-06T20:46:51Z"
  generation: 3
  labels:
    kops.k8s.io/cluster: my-cluster.lukasre.k8s.local
  name: nodes-immich-fsn1
spec:
  image: ubuntu-22.04
  kubelet:
    anonymousAuth: false
    nodeLabels:
      lukasre.ch/instancetype: immich
      node-role.kubernetes.io/node: ""
  machineType: cx21
  manager: CloudGroup
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-immich-fsn1
    lukasre.ch/instancetype: immich
  role: Node
  subnets:
  - fsn1

Update the cluster (including forcing a rolling update) with

kops update cluster --yes
kops rolling-update cluster --yes --force

5. What happened after the commands executed? Commands are successful, but node labels are not added.

The yaml representation of the newly created node is the following (only metadata)

apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 10.10.0.7
    csi.volume.kubernetes.io/nodeid: '{"csi.hetzner.cloud":"40260390"}'
    node.alpha.kubernetes.io/ttl: "0"
    projectcalico.org/IPv4Address: 10.10.0.7/32
    projectcalico.org/IPv4IPIPTunnelAddr: x.x.x.x
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2023-12-07T08:52:42Z"
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/instance-type: cx21
    beta.kubernetes.io/os: linux
    csi.hetzner.cloud/location: fsn1
    failure-domain.beta.kubernetes.io/region: fsn1
    failure-domain.beta.kubernetes.io/zone: fsn1-dc14
    kubernetes.io/arch: amd64
    kubernetes.io/hostname: nodes-immich-fsn1-67573cda4d994baa
    kubernetes.io/os: linux
    node-role.kubernetes.io/node: ""
    node.kubernetes.io/instance-type: cx21
    topology.kubernetes.io/region: fsn1
    topology.kubernetes.io/zone: fsn1-dc14
  name: nodes-immich-fsn1-67573cda4d994baa
  resourceVersion: "21804205"
  uid: f730e308-5bb9-49f3-b530-91f7a74b698c

6. What did you expect to happen? The node labels specified in the instance group

kops.k8s.io/instancegroup: nodes-immich-fsn1
lukasre.ch/instancetype: immich

are not added to the nodes

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

piVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2023-09-19T18:44:55Z"
  generation: 2
  name: my-cluster.lukasre.k8s.local
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: hetzner
  configBase: <configBase>
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: control-plane-fsn1
      name: etcd-1
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: control-plane-fsn1
      name: etcd-1
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.27.8
  networkCIDR: 10.10.0.0/16
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - name: fsn1
    type: Public
    zone: fsn1
  topology:
    dns:
      type: None

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-09-19T18:44:55Z"
  labels:
    kops.k8s.io/cluster: my-cluster.lukasre.k8s.local
  name: control-plane-fsn1
spec:
  image: ubuntu-20.04
  machineType: cx21
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - fsn1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-09-19T18:44:55Z"
  generation: 2
  labels:
    kops.k8s.io/cluster: my-cluster.lukasre.k8s.local
  name: nodes-fsn1
spec:
  image: ubuntu-20.04
  machineType: cx21
  maxSize: 2
  minSize: 2
  role: Node
  subnets:
  - fsn1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2023-12-06T20:46:51Z"
  generation: 3
  labels:
    kops.k8s.io/cluster: my-cluster.lukasre.k8s.local
  name: nodes-immich-fsn1
spec:
  image: ubuntu-22.04
  kubelet:
    anonymousAuth: false
    nodeLabels:
      lukasre.ch/instancetype: immich
      node-role.kubernetes.io/node: ""
  machineType: cx21
  manager: CloudGroup
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: nodes-immich-fsn1
    lukasre.ch/instancetype: immich
  role: Node
  subnets:
  - fsn1

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know? I looked at some existing issues and found #15090 and it seems it might be a similar issue: If you compare how labels are generated for OpenStack here and for hetzner here it seems that the labels are not passed to the nodeIdentity.Info object.

lukasredev commented 6 months ago

I would be happy to help with a fix, but would require some guidance :)

lukasredev commented 5 months ago

Anyone? :)

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 weeks ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/kops/issues/16159#issuecomment-2197098539): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
MTRNord commented 1 week ago

This seems to be still relevant :(

rifelpet commented 1 week ago

/reopen

Can you post logs from the kops-controller pods in kube-system? That is the component responsible for applying labels from instance groups to nodes.

For anyone looking into this bug, this is the controller that handles label updates, initialized here.

k8s-ci-robot commented 1 week ago

@rifelpet: Reopened this issue.

In response to [this](https://github.com/kubernetes/kops/issues/16159#issuecomment-2212589976): >/reopen > >Can you post logs from the kops-controller pods in kube-system? That is the component responsible for applying labels from instance groups to nodes. > >For anyone looking into this bug, [this](https://github.com/kubernetes/kops/blob/master/cmd/kops-controller/controllers/node_controller.go) is the controller that handles label updates, initialized [here](https://github.com/kubernetes/kops/blob/b7f5ffd1de38c51776115a2bbd6babda8b67815e/cmd/kops-controller/main.go#L318). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
MTRNord commented 1 week ago

/reopen

Can you post logs from the kops-controller pods in kube-system? That is the component responsible for applying labels from instance groups to nodes.

For anyone looking into this bug, this is the controller that handles label updates, initialized here.

The only one having logs for the creating of the node in question are these logs:

❯ kubectl logs -n kube-system kops-controller-w9jkr
I0707 19:26:35.841870       1 main.go:241] "msg"="starting manager" "logger"="setup"
I0707 19:26:35.842276       1 server.go:185] "msg"="Starting metrics server" "logger"="controller-runtime.metrics"
I0707 19:26:35.844427       1 server.go:139] kops-controller listening on :3988
I0707 19:26:35.844717       1 server.go:224] "msg"="Serving metrics server" "bindAddress"=":0" "logger"="controller-runtime.metrics" "secure"=false
I0707 19:26:35.844977       1 leaderelection.go:250] attempting to acquire leader lease kube-system/kops-controller-leader...
I0707 19:34:47.495953       1 server.go:220] performed successful callback challenge with 10.10.0.11:3987; identified as minecraft-58d2077a7bd90d8
I0707 19:34:47.495985       1 node_config.go:29] getting node config for &{APIVersion:bootstrap.kops.k8s.io/v1alpha1 Certs:map[] KeypairIDs:map[] IncludeNodeConfig:true Challenge:0x40007680f0}
I0707 19:34:47.497375       1 s3context.go:94] Found S3_ENDPOINT="https://s3.nl-ams.scw.cloud", using as non-AWS S3 backend
I0707 19:34:47.651300       1 server.go:259] bootstrap 10.10.0.2:28728 minecraft-58d2077a7bd90d8 success
I0707 19:41:16.781699       1 server.go:220] performed successful callback challenge with 10.10.0.7:3987; identified as nodes-hel1-7e8746841cf8f905
I0707 19:41:16.792090       1 server.go:259] bootstrap 10.10.0.2:64068 nodes-hel1-7e8746841cf8f905 success
I0707 19:49:05.298120       1 server.go:220] performed successful callback challenge with 10.10.0.5:3987; identified as nodes-hel1-d91dc6bfd5aab64
I0707 19:49:05.298167       1 node_config.go:29] getting node config for &{APIVersion:bootstrap.kops.k8s.io/v1alpha1 Certs:map[] KeypairIDs:map[] IncludeNodeConfig:true Challenge:0x400088a5f0}
I0707 19:49:05.440257       1 server.go:259] bootstrap 10.10.0.2:2440 nodes-hel1-d91dc6bfd5aab64 success
I0707 20:05:50.277408       1 server.go:220] performed successful callback challenge with 10.10.0.11:3987; identified as minecraft-6c3b8cb63d629438
I0707 20:05:50.288817       1 server.go:259] bootstrap 10.10.0.2:20488 minecraft-6c3b8cb63d629438 success

with this instancegroup config:

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-07-07T18:44:57Z"
  generation: 5
  labels:
    kops.k8s.io/cluster: midnightthoughts.k8s.local
  name: minecraft
spec:
  image: ubuntu-22.04
  kubelet:
    anonymousAuth: false
    nodeLabels:
      node-role.kubernetes.io/node: minecraft
  machineType: cax31
  manager: CloudGroup
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: minecraft
    role: minecraft
  role: Node
  subnets:
  - hel1
  taints:
  - app=minecraft:NoSchedule

using hetzner for the VMs and scaleway for the S3