Open applejag opened 2 months ago
Thanks for opening this issue. Yes, this is a widely discussed topic and we had several issues in the past for kured. That's an interesting approach which would solve several challenges related to this. However, I'm not sure if this would fit for all statefulset cases, depending on the rollout-strategy defined.
Yeah this would practically only help when you have single replica Deployments.
Single replica statefulsets would not really benefit from this, and in that case a PDB makes even less sense as you can't configure a statefulset to have a "surge" of temporarily increased amount of pods.
Maybe an edge case would be if you used the OpenKruise CloneSet with max surge, which is practically an extended Deployment but where you can make use of stuff like PVCs per pod. Which brings up a big downside of this approach: it's not very portable.
Should kured only have a special case for Deployment resources? Is there a way to design it so it becomes agnostic? Should kured have some special cases implemented for CustomResources like the CloneSet from OpenKruise that I mentioned earlier?
In my tunnel-visioned opinion, I think only having a special case for Deployment is fine :) Kind of because that's a solution that would help me. But as this seems to be a common thing, maybe that's fine?
There are some Kubernetes proposals on improving this whole situation, such as this one: https://github.com/kubernetes/enhancements/pull/4213 But that might take a while more until the proposal is accepted, then about a year before we start seeing it in stable releases, and then wait a bit more because most Kubernetes providers intentionally stays a minor version or more behind latest for stability reasons. On top of this, the proposal might also be rejected, and we have to start all over again with a new proposal, which takes a lot of work.
Which more favor into my argument that a "hardcoded fix"/"special behavior for deployments" might be sufficient for now. Wdyt?
Are you up for the idea? And if so, would you accept a PR on it?
Goodday! First off, really nice tool :heart: Works exactly as advertised.
We've hit an issue where we want to have deployments with
.spec.replicas: 1
, but still make use of PodDisruptionBudget (PDB). Most Kubernetes discussions and Kubernetes documentation basically summarizes this as a "bad idea", usually with the argument of "if you only have 1 replica, then you have already agreed to not having it HA".When kured evicts a pod (or when
kubectl drain
evicts one), it immediately terminates the existing pod before the replacement is up and running, leading to unwanted downtime/disruptions. Adding PDB is supposed to solve this, but when you only have 1 replica then the PDB will basically always report.status.disruptionsAllowed: 0
, leading to dead-locks in the whole node drain and reboot procedure.At our workplace we have the our microservice applied to two different use cases:
replicas: 2
or more, and then no issue with PDB and evictions.replicas: 1
to reduce resource consumption of having a lot of these smaller installations.This issue is only about use case 2 here.
Proposed solution: use
kubectl rollout restart
The
kubectl rollout restart
command basically just adds an annotation to the pod template in the Deployment or StatefulSet resource, forcing it to roll out a new version. In this article, they explain a way around this eviction block by making use of this rollout restart approach: https://www.artur-rodrigues.com/tech/2023/03/30/impossible-kubectl-drains.htmlWhat I envision that kured could do is:
.status.disruptionsAllowd: 0
, and is owned by a Deployment with.spec.replicas: 1
, then issue a rollout restart instead of evictionkured.dev/strategy: rollout-restart
, and is owned by a Deployment, then always use rollout restart approach on itFirst bullet point would be a "heuristic based approach" that should solve a lot of people's issues, without being too intrusive.
Using a rollout restart approach would restart all of the pods in the deployment, so it's not a given that the user wants to use this when replicas > 1, so for those cases the annotation provides an opt-in behavior.