rancher / cluster-api-provider-rke2

RKE2 bootstrap and control-plane Cluster API providers.
Apache License 2.0
82 stars 28 forks source link

Can not create Cluster from Cluster Class without repeating KCP's infrastructureRef #341

Open anmazzotti opened 4 months ago

anmazzotti commented 4 months ago

What happened: When defining my RKE2 cluster class I configured the RKE2ControlPlaneTemplate as follows:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: RKE2ControlPlaneTemplate
metadata:
  name: rke2-control-plane
  namespace: default
spec:
  template:
    spec:
      nodeDrainTimeout: 2m
      registrationMethod: "control-plane-endpoint"
      rolloutStrategy:
        rollingUpdate:
          maxSurge: 1
        type: RollingUpdate
      serverConfig:
        disableComponents:
          kubernetesComponents:
            - cloudController

In the CC definition the spec.controlPlane.machineInfrastructure.ref is correctly set:

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: rke2
  namespace: default
spec:
  controlPlane:
    machineInfrastructure:
      ref:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: ElementalMachineTemplate
        name: rke2-control-plane
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: RKE2ControlPlaneTemplate
      name: rke2-control-plane

However RKE2 provider fails to initialize the control plane machines:

I0531 08:47:34.318733       1 rke2controlplane_controller.go:430] "Reconcile RKE2 Control Plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/rke2-clusterclass-dfkb5" namespace="default" name="rke2-clusterclass-dfkb5" reconcileID="ceadb0a6-1099-43d6-a09e-ea2f21304cb9"
I0531 08:47:34.329520       1 rke2controlplane_controller.go:549] "Initializing control plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/rke2-clusterclass-dfkb5" namespace="default" name="rke2-clusterclass-dfkb5" reconcileID="ceadb0a6-1099-43d6-a09e-ea2f21304cb9" Desired=1 Existing=0
E0531 08:47:34.329689       1 scale.go:74] "Failed to create initial control plane Machine" err="failed to clone infrastructure template: failed to retrieve  external object \"default\"/\"\": Object 'Kind' is missing in 'unstructured object has no kind'" namespace="default" name="rke2-clusterclass-dfkb5" cluster-name="rke2-clusterclass"

The workaround is to repeat the infrastructureRef in the RKE2ControlPlaneTemplate, like this:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: RKE2ControlPlaneTemplate
metadata:
  name: rke2-control-plane
  namespace: default
spec:
  template:
    spec:
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: ElementalMachineTemplate
        name: rke2-control-plane
      nodeDrainTimeout: 2m
      registrationMethod: "control-plane-endpoint"
      rolloutStrategy:
        rollingUpdate:
          maxSurge: 1
        type: RollingUpdate
      serverConfig:
        disableComponents:
          kubernetesComponents:
            - cloudController

I think this should not be necessary due to the CC definition.

What did you expect to happen:

The RKE2ControlPlaneTemplate should respect what was defined in the Cluster Class.

How to reproduce it:

See config above, or it can be reproduced with the quickstart sample by removing the RKE2ControlPlaneTemplate.spec.template.spec.infrastructureRef object. reference

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

furkatgofurov7 commented 3 months ago

By briefly looking into our API and comparing it to CAPI's kubeadm, I observed the following divergance within our API:

  1. Having a RKE2ControlPlaneSpec.InfrastructureRef even though it is already part of the RKE2ControlPlaneSpec.MachineTemplate. That is not the case with kubeadm
  2. Referring to RKE2ControlPlaneSpec from RKE2ControlPlaneTemplateResource.Spec where it should refer to RKE2ControlPlaneTemplateResourceSpec instead (simply because we don't have that API struct for some reason), similar to what kubeadm exposes
github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 90 days with no activity.