nutanix-cloud-native / cluster-api-runtime-extensions-nutanix

https://nutanix-cloud-native.github.io/cluster-api-runtime-extensions-nutanix/
Apache License 2.0
7 stars 4 forks source link

fix: Handle long cluster names #845

Closed jimmidyson closed 4 weeks ago

jimmidyson commented 1 month ago

This commit starts fixing a bug that means addons fail to be fully deployed if the cluster name is longer than 44 characters. This is caused by the name of the HCP being over 63 characters. This name is then used in HRP labels which have a maximum length of 63 characters, so the HRPs are rejected by the API server when CAAPH applies them.

The fix is to add a UUID annotation to all Cluster resources, enforce this as immutable, and use that UUID in HCP names.

dkoshkin commented 1 month ago

I think this change introduced a bug with the cluster-autoscaler. The labels are now more generic and don't contain the cluster name, so the deployment would now select Pods from multiple clusters.

apiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    creationTimestamp: "2024-08-08T16:13:34Z"
    generateName: cluster-autoscaler-dkoshkin-super-long-cluster-name-that-should-57d568bc87-
    labels:
      app.kubernetes.io/instance: cluster-autoscaler
      app.kubernetes.io/name: clusterapi-cluster-autoscaler
      pod-template-hash: 57d568bc87
    name: cluster-autoscaler-dkoshkin-super-long-cluster-name-that-sh99px
    namespace: default
    ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: ReplicaSet
      name: cluster-autoscaler-dkoshkin-super-long-cluster-name-that-should-57d568bc87
      uid: b98473af-39e4-412c-932a-b7d54ee3893f
    resourceVersion: "965"
    uid: 1d12f4de-60f8-4683-bf8d-94806b035a8e
---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: cluster-autoscaler
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2024-08-08T16:13:24Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: cluster-autoscaler
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: clusterapi-cluster-autoscaler
    helm.sh/chart: cluster-autoscaler-9.37.0
  name: cluster-autoscaler-dkoshkin-super-long-cluster-name-that-should
  namespace: default
  resourceVersion: "468"
  uid: d16fdc42-7fec-4d80-b376-2d140d630741
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: cluster-autoscaler
      app.kubernetes.io/name: clusterapi-cluster-autoscaler

See example from main

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2024-08-08T16:03:08Z"
  generateName: cluster-autoscaler-dkoshkin-super-long-cluster-name-9794bc897-
  labels:
    app.kubernetes.io/instance: cluster-autoscaler-dkoshkin-super-long-cluster-name
    app.kubernetes.io/name: clusterapi-cluster-autoscaler
    pod-template-hash: 9794bc897
  name: cluster-autoscaler-dkoshkin-super-long-cluster-name-9794bcs55bf
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: cluster-autoscaler-dkoshkin-super-long-cluster-name-9794bc897
    uid: c6596f40-7244-43fe-b306-3dbcfb745c79
---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: cluster-autoscaler-dkoshkin-super-long-cluster-name
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2024-08-08T16:02:58Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: cluster-autoscaler-dkoshkin-super-long-cluster-name
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: clusterapi-cluster-autoscaler
    helm.sh/chart: cluster-autoscaler-9.37.0
  name: cluster-autoscaler-dkoshkin-super-long-cluster-name
  namespace: default
  resourceVersion: "559"
  uid: 215d0c21-e9a5-493f-b399-69196a157845
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: cluster-autoscaler-dkoshkin-super-long-cluster-name
      app.kubernetes.io/name: clusterapi-cluster-autoscaler
jimmidyson commented 1 month ago

Yes you're right @dkoshkin it does introduce that CA bug - fixing (somehow!).

jimmidyson commented 1 month ago

@dkoshkin Please try with latest commit. Using md5 sum of cluster namespace/name as helm release suffix.

jimmidyson commented 1 month ago

Fixed up again. @dkoshkin Please review when you get a mo.