Framework: Kind cluster as secondary management cluster?

lentzi90 commented 1 year ago

User Story

As an infrastructure provider developer I would like to use a less resource intensive and time consuming secondary management cluster in clusterctl upgrade tests to avoid resource congestion and long run times.

Detailed Description

Currently the clusterctl upgrade test is making use of a secondary management cluster. My understanding is that this is done to be able to treat the test similar to all other tests even though it is using older controller versions to start with. If we used the bootstrap cluster directly it would affect all other tests that potentially run in parallel with it.

Unfortunately this secondary management cluster can be an issue for providers. The bootstrap cluster is a light weight kind cluster that makes it easy to load any images that may be needed for the test. But the secondary management cluster will be a "normal provider cluster", meaning that it 1) may require way more resources (VMs vs containers) and 2) cannot easily load the images.

Taking CAPO as example, the secondary management cluster would consist of 3 VMs (1 bastion, 1 control plane and 1 worker). Add to this the actual workload cluster with 1 more bastion, 1 control plane and 2 workers when scaled. A total of 7 VMs for this one test and then we are not counting other cloud resources like load balancers. We also need a way to get the controller image to the secondary management cluster since we cannot use kind load in this case.

I would like to reuse something like CreateKindBootstrapClusterAndLoadImages for the secondary management cluster instead. Basically replacing this with this.

Anything else you would like to add:

It is very possible that I'm missing some reason why the secondary management cluster is created as a workload cluster from the provider. Please enlighten me in that case! :slightly_smiling_face:

/kind feature

sbueringer commented 1 year ago

Thx for opening the issue. This sounds like a good idea to me in general. It should speed up a lot of provider tests and make them simpler regarding the preload.

I only see one edge case. Today it's possible to use a pre-existing Kubernetes cluster instead of a kind cluster (in the core Cluster API tests this is exposed via --e2e.use-existing-cluster). This means that today we don't have a hard dependency to kind, by hard-coding usage of kind in the clusterctl upgrade test we would make it a hard dependency.

So I would suggest to add an additional field to ClusterctlUpgradeSpecInput to make it configurable if the secondary mgmt cluster should be created with the infra provider or with kind.

fabriziopandini commented 1 year ago

/triage accepted /help +1 to exploring this idea

k8s-ci-robot commented 1 year ago

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/7613): >/triage accepted >/help >+1 to exploring this idea Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

k8s-triage-robot commented 4 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

lentzi90 commented 4 months ago

I would still like to explore this. Just didn't get around to it /triage accepted /assign

fabriziopandini commented 3 months ago

PS. I will let developers a choice between the current behaviour and new behaviour, but we can figure this out also in the PR

fabriziopandini commented 1 month ago

/priority important-longterm

fabriziopandini commented 6 days ago

/unassign @lentzi90 /assign

I might have some bandwidth to give a stab to this one

kubernetes-sigs / cluster-api

Framework: Kind cluster as secondary management cluster? #7613

Guidelines