Closed relyt0925 closed 1 year ago
Default minReadySeconds
is 0. What problem is this solving exactly? What value should it be?
I'm inclined to change any problem component to not report ready until it is actually ready, assuming that is the issue.
We will get our expert ops team to place some comments on this one for some of the choosen values and their purpose. The main thing though is to control the velocity of the rollout
@rtheis (sorry for all the pings): Did you have any general background on why the 15 seconds were chosen for most of the deployments for minReadySeconds? I know part of it came from when components were restarting fast over one another at scale it put a lot of load on the management APIServer.
That would help kickoff further discussions on this one.
Hi folks. Here is the general guidance that we provide our teams with respect to readiness. We prefer probes as the primary means of determining readiness. However, we also use minReadySeconds
to ensure stability and availability during rollouts to prevent rollouts from proceeding so quickly that an app update results in all pods crashing. We also use this as a means to protect our managed environment from pod restart storms.
Microservices that do have a readiness probe should set minReadySeconds to 15 and those without a probe should set it to 30, to assist in a controlled rollout of the microservice pods. The general goal is to have a microservice pod only report to Kubernetes as being ready when it has completed initialization and is stable enough to complete tasks.
My advise is for Hypershift control plane components to either use readiness probes (if available) or set minReadySeconds
. Using both readiness probes and minReadySeconds
is acceptable as well.
From an OpenShift perspective, the cluster policy controller has been troublesome for us. While working to handle its availability during rollouts, we hit https://github.com/kubernetes/kubernetes/issues/108266, which until fixed, breaks one of the reasons that we use minReadySeconds
.
I hope this helps.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
We need to set minReadySeconds to control how fast ha deployments rollout (allow them to be ready for sometime before continuing rollout.
deployment.apps/cluster-api minReadySeconds: deployment.apps/cluster-policy-controller minReadySeconds: deployment.apps/ignition-server minReadySeconds: deployment.apps/konnectivity-agent minReadySeconds: deployment.apps/kube-apiserver minReadySeconds: deployment.apps/kube-controller-manager minReadySeconds: deployment.apps/kube-scheduler minReadySeconds: deployment.apps/oauth-openshift minReadySeconds: deployment.apps/openshift-apiserver minReadySeconds: deployment.apps/openshift-controller-manager minReadySeconds: deployment.apps/openshift-oauth-apiserver minReadySeconds: deployment.apps/packageserver minReadySeconds: