kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.57k stars 1.31k forks source link

Paused behaviour is inconsistent #6966

Open enxebre opened 2 years ago

enxebre commented 2 years ago

What steps did you take and what happened:

In a MachineDeployment: In the predicates we check cluster.Spec.Paused or the MachineDeployment has the annotation. We ignore MachineDeployment.Spec.Paused https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_controller.go#L75-L97.

Then in the reconciling logic we first check the cluster.Spec.Paused or the MachineDeployment has the annotation. We ignore We ignore MachineDeployment.Spec.Paused https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_controller.go#L126-L130

Then lines below we check only the MachineDeployment.Spec.Paused. We ignore cluster.Spec.Paused or the MachineDeployment has the annotation. https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_controller.go#L225-L227

What did you expect to happen: Always honour .spec.paused and fallback to the annotation for backward compatibility. Introduce .spec.paused in MachineSets. Review all CRDs to make the above consistent.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

killianmuldoon commented 2 years ago

It looks like the current behaviour of paused is different depending on whether you set the spec field or the annotation? With the annotation you get no reconciliation, but with the spec field we end up calling sync() which has even allows some scaling behaviour? :thinking:

Is that specific behaviour (sync when paused) used in some of our workflows - maybe a clusterctl move?

enxebre commented 2 years ago

/kind cleanup

sbueringer commented 2 years ago

Q: If we add a paused field to the MachineSet would we then propagate the value of paused from the MD to the MS? If yes, I assume this shouldn't trigger a rollout? (and we have to take care it doesn't specifically)

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sbueringer commented 1 year ago

/remove-lifecycle stale

fabriziopandini commented 1 year ago

/triage accepted

k8s-triage-robot commented 9 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

fabriziopandini commented 6 months ago

/priority important-longterm

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sbueringer commented 3 months ago

Maybe we can fix this with v1beta2

fabriziopandini commented 3 months ago

@sbueringer is this issue already tracked in the umbrella issue for v1beta2?

sbueringer commented 3 months ago

Now, yes

fabriziopandini commented 2 weeks ago

My personal take:

ParkHyeongKyu commented 2 weeks ago

I'm going through the same situation. I set MachineDeployment.Spec.Paced to True to prevent me from doing scale up or down when I increased or decreased replica, but it still does scale work.

This seems to be because the scale operation is still happening in the sync function as you can see in the link below. https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_sync.go#L48

chrischdi commented 1 week ago

I'm going through the same situation. I set MachineDeployment.Spec.Paced to True to prevent me from doing scale up or down when I increased or decreased replica, but it still does scale work.

This seems to be because the scale operation is still happening in the sync function as you can see in the link below. main/internal/controllers/machinedeployment/machinedeployment_sync.go#L48

You should use the paused annotation if you want to prevent things from happening.