The field failureDomainSelector is causing problems with cloning

hrbasic commented 1 year ago

/kind bug

What steps did you take and what happened: I've upgraded CAPV provider to 1.7.1, after upgrade I couldn't deploy new clusters due to issue with new field failureDomainSelector. According to documentation: FailureDomainSelector is the label selector to use for failure domain selection for the control plane nodes of the cluster. An empty value for the selector includes all the related failure domains.https://doc.crds.dev/github.com/kubernetes-sigs/cluster-api-provider-vsphere/infrastructure.cluster.x-k8s.io/VSphereCluster/v1beta1@v1.7.1.

This shouldn't be a breaking change, but if you don't specify FailureDomainSelector on VSphereCluster object, cloning will fail: unable to get resource pool for "infrastructure.cluster.x-k8s.io/v1beta1, Kind=VSphereVM dev-hbasic-2-iot1-cluster/dev-hbasic-2-lwqjl": no default resource pool found I've checked machines.cluster.x-k8s.io object and Failure Domain field is missing in spec:

  Bootstrap:
    Config Ref:
      API Version:     bootstrap.cluster.x-k8s.io/v1beta1
      Kind:            KubeadmConfig
      Name:            dev-hbasic-2-z2cqt
      Namespace:       dev-hbasic-2-iot1-cluster
      UID:             82a49d8c-1fcc-45a9-917e-45740277bbd6
    Data Secret Name:  dev-hbasic-2-z2cqt
  Cluster Name:        dev-hbasic-2
  Infrastructure Ref:
    API Version:          infrastructure.cluster.x-k8s.io/v1beta1
    Kind:                 VSphereMachine
    Name:                 dev-hbasic-2-control-plane-v1.26.3-964xq
    Namespace:            dev-hbasic-2-iot1-cluster
    UID:                  6178d20e-47be-4c4e-b3ef-a1fe6c4e2e9a
  Node Deletion Timeout:  10s
  Version:                v1.26.3

Also, in documentation is specified that FailureDomainSelector is the label selector to use for failure domain selection. My understanding was that we need to label vSphereFailureDomain. But if you check the code: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/release-1.7/controllers/vspherecluster_reconciler.go#L381, label is checked on the zone, so we need to label vSphereDeploymentZone to make this work.

What did you expect to happen: After upgrade capv upgrade to 1.7.1, we should be able to deploy new cluster without specifying FailureDomainSelector field. Documentation for FailureDomainSelector should be improved since it's not clear that vSphereDeploymentZone should be labeled instead of vSphereFailureDomain

Environment:

Cluster-api-provider-vsphere version: 1.7.1
Kubernetes version: (use kubectl version): 1.26.3
OS (e.g. from /etc/os-release): Rocky Linux 8.7 (Green Obsidian)

chrischdi commented 1 year ago

Hi @hrbasic , thanks for opening the issue.

Could you please also add information about which version you upgraded from and provide some example yaml file? This would help to trace down the issue to the related change.

hrbasic commented 1 year ago

Hi, Previous version was v1.6.1. YAML examples:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
metadata:
  name: io-hbasic-1
  namespace: io-hbasic-1-iot1-cluster
  labels:
    k8s.domain.com/iks-cluster: "true"
  annotations:
    node.alpha.kubernetes.io/ttl: "0"
spec:
  controlPlaneEndpoint:
    host: 10.38.29.74
    port: 6443
  identityRef:
    kind: Secret
    name: <secret>
  server: vc-io-anc-01.io-domain.local
  thumbprint: ''
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  name: io-hbasic-1-control-plane-v1.26.3
  namespace: io-hbasic-1-iot1-cluster
  labels:
    k8s.domain.com/cluster-name: io-hbasic-1
    k8s.domain.com/control-plane: "true"
    k8s.domain.com/iks-cluster: "true"
  annotations:
    node.alpha.kubernetes.io/ttl: "0"
spec:
  template:
    spec:
      cloneMode: linkedClone
      datacenter: IO
      datastore: 
      diskGiB: 55
      folder: 
      memoryMiB: 8192
      network:
        devices:
        - dhcp4: false
          dhcp6: false
          gateway4: 10.38.28.1
          networkName: DS01-DPG-388
          nameservers:
          -  169.254.53.53  
      numCPUs: 2
      os: Linux
      resourcePool: 
      server: vc-io-anc-01.io-domain.local
      storagePolicyName: ""
      template: Rocky8-k8s-capi-2023-04-12-kube-v1.26.3
      thumbprint: ''
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: io-hbasic-1
  namespace: io-hbasic-1-iot1-cluster
  labels:
    k8s.domain.com/control-plane: "true"
    k8s.domain.com/cluster-name: io-hbasic-1
    k8s.domain.com/location: iot1
    k8s.domain.com/version: v1.26.3
    k8s.domain.com/iks-cluster: "true"
  annotations:
    node.alpha.kubernetes.io/ttl: "0"
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        certSANs:
          - kubernetes.default.svc.io-hbasic-1.iot1.k8s.io-domain.local
          - localhost
          - 127.0.0.1
        extraArgs:
          cloud-provider: external
          oidc-issuer-url: "https://dex.io-domain.local"
          oidc-client-id: "domain-ad"
          oidc-groups-claim: "groups"
          oidc-ca-file: /etc/ssl/certs/domain-ca.pem
          oidc-username-claim: email
      controllerManager:
        extraArgs:
          cloud-provider: external
          allocate-node-cidrs: "false"
          bind-address: "0.0.0.0"
      scheduler:
        extraArgs:
          bind-address: "0.0.0.0"
      etcd:
        local:
          extraArgs:
            listen-metrics-urls: 'http://0.0.0.0:2381'
    files:
    - content: |
        apiVersion: v1
        kind: Pod
        metadata:
          creationTimestamp: null
          name: kube-vip
          namespace: kube-system
        spec:
          containers:
          - args:
            - manager
            env:
            - name: cp_enable
              value: "true"
            - name: vip_interface
              value: ""
            - name: address
              value: 10.38.29.74
            - name: port
              value: "6443"
            - name: vip_arp
              value: "true"
            - name: vip_leaderelection
              value: "true"
            - name: vip_leaseduration
              value: "15"
            - name: vip_renewdeadline
              value: "10"
            - name: vip_retryperiod
              value: "2"
            image: ghcr.io/kube-vip/kube-vip:v0.5.11
            imagePullPolicy: IfNotPresent
            name: kube-vip
            resources: {}
            securityContext:
              capabilities:
                add:
                - NET_ADMIN
                - NET_RAW
            volumeMounts:
            - mountPath: /etc/kubernetes/admin.conf
              name: kubeconfig
          hostAliases:
          - hostnames:
            - kubernetes
            ip: 127.0.0.1
          hostNetwork: true
          volumes:
          - hostPath:
              path: /etc/kubernetes/admin.conf
              type: FileOrCreate
            name: kubeconfig
        status: {}
      owner: root:root
      path: /etc/kubernetes/manifests/kube-vip.yaml
    - content: |
        -----BEGIN CERTIFICATE-----
        <my_cert>
        -----END CERTIFICATE-----
      owner: root:root
      path: /etc/ssl/certs/domain-ca.pem
    - content: |
        apiVersion: kubelet.config.k8s.io/v1beta1
        kind: KubeletConfiguration
        authentication:
            anonymous:
              enabled: false
            webhook:
              cacheTTL: 2m
              enabled: true
            x509:
              clientCAFile: /etc/kubernetes/pki/ca.crt
        authorization:
            mode: Webhook
            webhook:
              cacheAuthorizedTTL: 5m
              cacheUnauthorizedTTL: 30s
        cgroupDriver: systemd
        clusterDomain: cluster.local
        clusterDNS:
        - 169.254.25.10
        cpuManagerReconcilePeriod: 10s
        evictionPressureTransitionPeriod: 2m
        fileCheckFrequency: 20s
        httpCheckFrequency: 20s
        imageMinimumGCAge: 0s
        nodeStatusUpdateFrequency: 10s
        rotateCertificates: true
        runtimeRequestTimeout: 2m
        shutdownGracePeriod: 60s
        shutdownGracePeriodCriticalPods: 20s
        streamingConnectionIdleTimeout: 4h
        staticPodPath: /etc/kubernetes/manifests
        syncFrequency: 1m
        volumeStatsAggPeriod: 0s
        kubeReserved:
          cpu: 200m
          memory: 512Mi
        serverTLSBootstrap: true
        systemReserved:
          cpu: 300m
          memory: 1400Mi
      owner: root:root
      path: /var/lib/kubelet/kubeletconfiguration0+merge.yaml
    - content: |
        apiVersion: v1
        kind: Pod
        metadata:
          labels:
            component: kube-scheduler
            tier: control-plane
          name: kube-scheduler
          namespace: kube-system
        spec:
          containers:
          - command:
            - kube-scheduler
            - --config=/etc/kubernetes/ibscheduler.conf
            - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
            - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
            image: registry.k8s.io/kube-scheduler:v1.26.3
            imagePullPolicy: IfNotPresent
            livenessProbe:
              failureThreshold: 8
              httpGet:
                host: 127.0.0.1
                path: /healthz
                port: 10259
                scheme: HTTPS
              initialDelaySeconds: 10
              periodSeconds: 10
              timeoutSeconds: 15
            name: kube-scheduler
            resources:
              requests:
                cpu: 100m
            startupProbe:
              failureThreshold: 24
              httpGet:
                host: 127.0.0.1
                path: /healthz
                port: 10259
                scheme: HTTPS
              initialDelaySeconds: 10
              periodSeconds: 10
              timeoutSeconds: 15
            volumeMounts:
            - mountPath: /etc/kubernetes/scheduler.conf
              name: kubeconfig
              readOnly: true
            - mountPath: /etc/kubernetes/ibscheduler.conf
              name: ibsched
              readOnly: true
          hostNetwork: true
          priorityClassName: system-node-critical
          securityContext:
            seccompProfile:
              type: RuntimeDefault
          volumes:
          - hostPath:
              path: /etc/kubernetes/scheduler.conf
              type: FileOrCreate    
            name: kubeconfig
          - hostPath:
              path: /etc/kubernetes/ibscheduler.conf
              type: FileOrCreate
            name: ibsched
      owner: root:root
      path: /var/lib/kubelet/kube-scheduler0+merge.yaml
    - content: |
        apiVersion: kubescheduler.config.k8s.io/v1
        kind: KubeSchedulerConfiguration
        clientConnection:
          kubeconfig: /etc/kubernetes/scheduler.conf
        profiles:
          - schedulerName: default-scheduler
            pluginConfig:
              - name: PodTopologySpread
                args:
                  defaultConstraints:
                    - maxSkew: 1
                      topologyKey: topology.kubernetes.io/zone
                      whenUnsatisfiable: DoNotSchedule
                  defaultingType: List
      owner: root:root
      path: /etc/kubernetes/ibscheduler.conf
    - content: |
        ---
        apiVersion: kubeproxy.config.k8s.io/v1alpha1
        kind: KubeProxyConfiguration
        metricsBindAddress: "0.0.0.0:10249"
      owner: root:root
      path: /etc/kubernetes/ib-kube-proxy-conf.yaml
    initConfiguration:
      nodeRegistration:
        criSocket: /var/run/containerd/containerd.sock
        name: '{{ local_hostname }}'
        kubeletExtraArgs:
          cloud-provider: external
      patches:
        directory: /var/lib/kubelet
    joinConfiguration:
      nodeRegistration:
        criSocket: /var/run/containerd/containerd.sock
        name: '{{ local_hostname }}'
        kubeletExtraArgs:
          cloud-provider: external
      patches:
        directory: /var/lib/kubelet
    preKubeadmCommands:
    - hostnamectl set-hostname "{{ local_hostname }}"
    - echo "::1         ipv6-localhost ipv6-loopback localhost6 localhost6.localdomain6"
      >/etc/hosts
    - echo "127.0.0.1   {{ local_hostname }}.io-domain.local {{ local_hostname }} localhost
      localhost.localdomain localhost4 localhost4.localdomain4" >>/etc/hosts
    - growpart /dev/sda 2 && pvresize /dev/sda2 && lvresize -r -l +100%FREE /dev/os/lv_var_lib_containerd
    - cat /etc/kubernetes/ib-kube-proxy-conf.yaml >> /run/kubeadm/kubeadm.yaml
    postKubeadmCommands:
    - echo '{"run_list":["recipe[ib_iks_vm]"]}' > /etc/chef/first-boot.json
    - chef-client -j /etc/chef/first-boot.json -E IOT1
    useExperimentalRetryJoin: true
    users:
    - name: capv
      sshAuthorizedKeys:
      - ssh-rsa <PUBLIC KEY> <USER>
      sudo: ALL=(ALL) NOPASSWD:ALL
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: VSphereMachineTemplate
      name: io-hbasic-1-control-plane-v1.26.3
    metadata:
      labels:
        k8s.domain.com/nodepool: control-plane
        k8s.domain.com/iks-cluster: "true"
      annotations:
        node.alpha.kubernetes.io/ttl: "0"
  replicas: 3
  version: v1.26.3

If I label CP zones and add field failureDomainSelector in VSphereClusterObject, then everything works as expected.

  failureDomainSelector: 
    matchLabels: 
      topology.k8s.domain.com/group: gp-cp
  thumbprint: ''

If you need more info, let me know.

chrischdi commented 1 year ago

FailureDomainSelector is the label selector to use for failure domain selection for the control plane nodes of the cluster. An empty value for the selector includes all the related failure domains.

So there are three different modes for the field .spec.failureDomainSelector:

nil value failureDomainSelector: nil
- This results in not using the VSphereDeploymentZone objects
empty value: failureDomainSelector: {}
- This results in using all found VSphereDeploymentZone objects
value set: failureDomainSelector: { "matchLabels": { "foo": "bar" } }
- This results in using all found VSphereDeploymentZone which match the defined selector

Historical context on when that behaviour was changed: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/1951#issuecomment-1598531482

TLDR: it was considered as bug that an nil selector resulted in considering all failure domains. Looks like it was not really highlighted in the release notes though.

hrbasic commented 1 year ago

Great, thanks for clarification. I'll close this since it's not considered as a bug. But maybe it would be good to highlight this, because if someone upgrades provider and doesn't update VSphereCluster with failureDomainSelector on existing clusters, rollout of kubeadmcontrolplane could fail.

chrischdi commented 1 year ago

Thank you @hrbasic for filing the issue and sorry for that.

I updated the release notes to highlight this PR as breaking change and added some information about it in the v1.7.0 release notes, so others may find the information 👍

kubernetes-sigs / cluster-api-provider-vsphere

The field failureDomainSelector is causing problems with cloning #2317