kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.5k stars 1.3k forks source link

Remove experimentalRetryJoin #6942

Open killianmuldoon opened 2 years ago

killianmuldoon commented 2 years ago

experimentalRetryJoin is deprecated as of CAPI release 1.2. Removal is expected to happen in release 1.3 - but before proceeding we should have high confidence that CAPI + Kubeadm are capable today of solving the problem that experimentalRetryJoin was designed to fix.

In order to increase confidence in the removal we should:

  1. Ask providers which, if any, are still using experimentalRetryJoin, to what extent, and to solve what problem.
  2. Figure out a testing strategy for providers that would trigger the problem experimentalRetryJoin was designed to fix.
  3. Ensure these tests are passing and we have high confidence in the current retryJoin behaviour of Kubeadm.

Once we're confident that we can proceed with removal we need to:

  1. Remove the field from the bootstrap KubeadmConfig types + regenerate CRDs
  2. Stop embedding the kubeadm-bootsrap-script in bootstrap/kubeadm/internal/cloudinit/cloudinit.go
  3. Update cloudinit.go, removing functions and variables related to the embedded script e.g. generateBootstrapScript, kubeadmBootstrapScript
  4. Remove the KubeadmBootstrap script
  5. Add a note to the migration guide that the

/area bootstrap

sbueringer commented 2 years ago

In my opinion we should never remove a field from a v1beta1 apiVersion without an apiVersion bump as it's a breaking change to v1beta1 otherwise.

killianmuldoon commented 2 years ago

In my opinion we should never remove a field from a v1beta1 apiVersion without an apiVersion bump as it's a breaking change to v1beta1 otherwise.

Good point, but I suppose that doesn't necessarily preclude us from changing the behaviour. It's not exactly a feature gate, but the field is prefixed with experimental

sbueringer commented 2 years ago

So we would keep the field but disable its behavior? I think that's a very dangerous precedent and ~ the same as dropping the field.

killianmuldoon commented 2 years ago

So we would keep the field but disable its behavior? I think that's a very dangerous precedent and ~ the same as dropping the field.

This is done regularly for experimental features as I understand it. The key question is whether we consider this experimental or not given the prefix. I think the most important thing here is whether the actual behaviour is fixed rather than the API fields and behaviour though.

killianmuldoon commented 2 years ago

Maybe an alternative path to deprecating this is to add a feature flag and a binary flag to enable this behaviour while deprecating and eventually removing the experimental API field. This would bring the way we enable the feature in line with our other experimental features like ClusterClass and MachinePools.

vincepri commented 1 year ago

/kind api-change

As mentioned above, we need to deprecate now (just a comment on the field would suffice) + remove in the next API version

vincepri commented 1 year ago

/triage accepted

fabriziopandini commented 1 year ago

/milestone v1.3

we should get this done before the release

fabriziopandini commented 1 year ago

/good-first-issue

sbueringer commented 1 year ago

If I'm not mistaken deprecation was done in these PRs:

It was already part of v1.2.0 (https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.2.0).

I wonder a bit about removal though.

I don't have background information for what exactly this feature was needed and why we can drop it now (either because it's not needed anymore or we have a replacement I guess (?)).

Additional indication for that:

@yastij: why did we drop UseExperimentalRetryJoin it's still needed in slow/flaky envs https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/1602#discussion_r965996706

killianmuldoon commented 1 year ago

I think we should remove the good first issue and help wanted labels, as this removal isn't something easily implemented. @fabriziopandini WDYT?

fabriziopandini commented 1 year ago

~~If I got this right what we are doing now is just applying a deprecated label on the API field (I have updated the issue title) actual deprecation will happen in the next cycle~~

sbueringer commented 1 year ago

If I'm not mistaken deprecation was done in these PRs:

It was already part of v1.2.0 (https://github.com/kubernetes-sigs/cluster-api/releases/tag/v1.2.0).

I wonder a bit about removal though.

I don't have background information for what exactly this feature was needed and why we can drop it now (either because it's not needed anymore or we have a replacement I guess (?)).

Additional indication for that:

@yastij: why did we drop UseExperimentalRetryJoin it's still needed in slow/flaky envs kubernetes-sigs/cluster-api-provider-vsphere#1602 (comment)

I'm slightly confused. As mentioned, I think we did this already?

fabriziopandini commented 1 year ago

Ok I have missed this. reset.

fabriziopandini commented 1 year ago

/remove-good-first-issue

fabriziopandini commented 1 year ago

Dropping to the milestone @yastij @randomvariable what is your opinion about making this happen in 1.4 timeframe (target date ~march 2023)

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sbueringer commented 1 year ago

/remove-lifecycle stale

k8s-triage-robot commented 7 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

fabriziopandini commented 5 months ago

/priority important-longterm

sbueringer commented 4 months ago

Just a note. Just checked linked issues and talked to some folks. We should be really able to remove the field as there were various improvements to kubeadm.

Also in the worst case were folks still need it they can remove the kubeadm binary through a script and implement their own retry handling there.

neolit123 commented 4 months ago

Just a note. Just checked linked issues and talked to some folks. We should be really able to remove the field as there were various improvements to kubeadm.

Also in the worst case were folks still need it they can remove the kubeadm binary through a script and implement their own retry handling there.

+1, i though this was removed already. :\

neolit123 commented 4 months ago

i see there are steps in the OP, i can give this a shot. /assign

sbueringer commented 4 months ago

We have to wait for the next apiVersion (it's deprecated already)

I just additionally talked to some folks if they still depend on it

fabriziopandini commented 4 months ago

based on the comment above /triage accepted

chrischdi commented 1 month ago

Note: https://github.com/kubernetes-sigs/cluster-api/pull/10983 gives another indicator that we should not re-add this in v1beta2.