syself / cluster-api-provider-hetzner

Cluster API Provider Hetzner :rocket: The best way to manage Kubernetes clusters on Hetzner, fully declarative, Kubernetes-native and with self-healing capabilities
https://caph.syself.com
Apache License 2.0
697 stars 62 forks source link

Inability to provision Cluster with an externally managed Control Plane #845

Closed prometherion closed 10 months ago

prometherion commented 1 year ago

/kind bug

What steps did you take and what happened:

Try to create an HetznerCluster resource with the following manifest:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: HetznerCluster
metadata:
  name: workload
  namespace: default
spec:
  controlPlaneEndpoint:
    host: ""
    port: 443
  controlPlaneLoadBalancer:
    enabled: false
  controlPlaneRegions: []
  hcloudNetwork:
    enabled: false
  hcloudPlacementGroups:
  - name: md-0
    type: spread
  hetznerSecretRef:
    key:
      hcloudToken: hcloud
    name: hetzner
  sshKeys:
    hcloud:
    - name: REDACTED

Getting the following error:

The HetznerCluster "workload" is invalid: 
* spec.controlPlaneRegions: Invalid value: []v1beta1.Region{}: control plane regions must not be empty
* spec.controlPlaneEndpoint: Invalid value: :443: controlPlaneEndpoint has to be specified if controlPlaneLoadBalancer is not enabled

What did you expect to happen:

The said Cluster is referencing an externally managed Control Plane thanks to Kamaji.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: workload
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 10.244.0.0/16
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KamajiControlPlane
    name: workload-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: HetznerCluster
    name: workload

Kamaji is offering a CAPI Control Plane provider which is responsible for creating the required Control Plane endpoint, and then patching it back to the provided Infrastructure Provider.

Cluster API contract allows specifying an externally managed Control Plane, by not giving for granted it must be on the same infrastructure provider as per VM/machine.

My original plan was to take advantage of the HetznerClusters.spec.controlPlaneLoadBalancer.enabled field set to false, however, with the said error, I'm confused about how it should be used.

My expectation would be delegating the provisioning of the Control Plane provider (in this case, Kamaji) in getting the required address and offloading to the Kamaji CP Provider to patch the HetznerClusters resource with the IP address.

This is the pattern we agreed with several other providers, such as:

  • CAPO (OpenStack)
  • KubeVirt
  • Metal3
  • Equinix

Anything else you would like to add:

N.R.

Environment:

prometherion commented 1 year ago

A possible solution could be removing this validation. https://github.com/syself/cluster-api-provider-hetzner/blob/5912b9f022c3ff8e5773ee183cd86a9ce3acdceb/api/v1beta1/hetznercluster_webhook.go#L109-L122

guettli commented 1 year ago

@prometherion just out of curiosity: why did you change the imports in the pr? Do you use a tool for sorting the imports?

prometherion commented 1 year ago

I would rather discuss this on the PR with the changes rather than the issue itself.

However, my IDE is automatically configured with the goimports utility which groups import statements in the order stdlib, external packages, and internal ones.

I just noticed you're not enforcing this styling guide, tho, I can amend the changes and adapt to the repository one.

prometherion commented 1 year ago

@guettli any feedback on the last comment?

lieberlois commented 1 year ago

Did you get this working?

prometherion commented 1 year ago

Yes, @lieberlois.

If you reference an already existing IP address managed by the Control Plane, it works.

However, in the context of self-service provisioning, the Control Plane Endpoint would be dynamic, and the changes I introduced support the said use case: Kamaji creates the control plane, patches the infrastructure cluster and everything else is done as expected, such as nodes provisioning and bootstrapping.

@guettli said the team is in the middle of a release process, I'm feeling confident we can get an answer from them on short notice.