Paused behaviour is inconsistent

enxebre commented 2 years ago

What steps did you take and what happened:

In a MachineDeployment: In the predicates we check cluster.Spec.Paused or the MachineDeployment has the annotation. We ignore MachineDeployment.Spec.Paused https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_controller.go#L75-L97.

Then in the reconciling logic we first check the cluster.Spec.Paused or the MachineDeployment has the annotation. We ignore We ignore MachineDeployment.Spec.Paused https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_controller.go#L126-L130

Then lines below we check only the MachineDeployment.Spec.Paused. We ignore cluster.Spec.Paused or the MachineDeployment has the annotation. https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_controller.go#L225-L227

What did you expect to happen: Always honour .spec.paused and fallback to the annotation for backward compatibility. Introduce .spec.paused in MachineSets. Review all CRDs to make the above consistent.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api version:
minikube/kind version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

killianmuldoon commented 2 years ago

It looks like the current behaviour of paused is different depending on whether you set the spec field or the annotation? With the annotation you get no reconciliation, but with the spec field we end up calling sync() which has even allows some scaling behaviour? :thinking:

Is that specific behaviour (sync when paused) used in some of our workflows - maybe a clusterctl move?

enxebre commented 2 years ago

/kind cleanup

sbueringer commented 2 years ago

Q: If we add a paused field to the MachineSet would we then propagate the value of paused from the MD to the MS? If yes, I assume this shouldn't trigger a rollout? (and we have to take care it doesn't specifically)

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sbueringer commented 1 year ago

/remove-lifecycle stale

fabriziopandini commented 1 year ago

/triage accepted

k8s-triage-robot commented 9 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

fabriziopandini commented 6 months ago

/priority important-longterm

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sbueringer commented 3 months ago

Maybe we can fix this with v1beta2

fabriziopandini commented 3 months ago

@sbueringer is this issue already tracked in the umbrella issue for v1beta2?

sbueringer commented 3 months ago

Now, yes

fabriziopandini commented 2 weeks ago

My personal take:

MachineDeployment.Spec.Paused should be dropped, It seems something we "copied" during the initial implementation of MD, but never designed properly.
We should not add .spec.Paused to all the objects because IMO pausing pausing single objects (instead of the entire cluster) should be an exception to be considered only by users that knowns very well what they are doing, so I prefer to not expose this as a spec field everywhere.

ParkHyeongKyu commented 2 weeks ago

I'm going through the same situation. I set MachineDeployment.Spec.Paced to True to prevent me from doing scale up or down when I increased or decreased replica, but it still does scale work.

This seems to be because the scale operation is still happening in the sync function as you can see in the link below. https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/machinedeployment/machinedeployment_sync.go#L48

chrischdi commented 1 week ago

I'm going through the same situation. I set MachineDeployment.Spec.Paced to True to prevent me from doing scale up or down when I increased or decreased replica, but it still does scale work.

This seems to be because the scale operation is still happening in the sync function as you can see in the link below. main/internal/controllers/machinedeployment/machinedeployment_sync.go#L48

You should use the paused annotation if you want to prevent things from happening.

kubernetes-sigs / cluster-api

Paused behaviour is inconsistent #6966