kubereboot / kured

Kubernetes Reboot Daemon
https://kured.dev
Apache License 2.0
2.11k stars 200 forks source link

Use `kubectl rollout restart` on single-replica Deployments with PDB #928

Open applejag opened 2 months ago

applejag commented 2 months ago

Goodday! First off, really nice tool :heart: Works exactly as advertised.

We've hit an issue where we want to have deployments with .spec.replicas: 1, but still make use of PodDisruptionBudget (PDB). Most Kubernetes discussions and Kubernetes documentation basically summarizes this as a "bad idea", usually with the argument of "if you only have 1 replica, then you have already agreed to not having it HA".

When kured evicts a pod (or when kubectl drain evicts one), it immediately terminates the existing pod before the replacement is up and running, leading to unwanted downtime/disruptions. Adding PDB is supposed to solve this, but when you only have 1 replica then the PDB will basically always report .status.disruptionsAllowed: 0, leading to dead-locks in the whole node drain and reboot procedure.

At our workplace we have the our microservice applied to two different use cases:

  1. Few big installation, handling lots of traffic. Here we can just use replicas: 2 or more, and then no issue with PDB and evictions.
  2. Many small installation, each handling a small bit of spiky traffic. Here we can set the resource limits and requests to make them smaller, but the nature of our service results in very spiky usage, so we can't shrink the resources too much. So we want to scale down to replicas: 1 to reduce resource consumption of having a lot of these smaller installations.

This issue is only about use case 2 here.

Proposed solution: use kubectl rollout restart

The kubectl rollout restart command basically just adds an annotation to the pod template in the Deployment or StatefulSet resource, forcing it to roll out a new version. In this article, they explain a way around this eviction block by making use of this rollout restart approach: https://www.artur-rodrigues.com/tech/2023/03/30/impossible-kubectl-drains.html

What I envision that kured could do is:

First bullet point would be a "heuristic based approach" that should solve a lot of people's issues, without being too intrusive.

Using a rollout restart approach would restart all of the pods in the deployment, so it's not a given that the user wants to use this when replicas > 1, so for those cases the annotation provides an opt-in behavior.

ckotzbauer commented 1 month ago

Thanks for opening this issue. Yes, this is a widely discussed topic and we had several issues in the past for kured. That's an interesting approach which would solve several challenges related to this. However, I'm not sure if this would fit for all statefulset cases, depending on the rollout-strategy defined.

applejag commented 1 month ago

Yeah this would practically only help when you have single replica Deployments.

Single replica statefulsets would not really benefit from this, and in that case a PDB makes even less sense as you can't configure a statefulset to have a "surge" of temporarily increased amount of pods.

Maybe an edge case would be if you used the OpenKruise CloneSet with max surge, which is practically an extended Deployment but where you can make use of stuff like PVCs per pod. Which brings up a big downside of this approach: it's not very portable.

Should kured only have a special case for Deployment resources? Is there a way to design it so it becomes agnostic? Should kured have some special cases implemented for CustomResources like the CloneSet from OpenKruise that I mentioned earlier?

In my tunnel-visioned opinion, I think only having a special case for Deployment is fine :) Kind of because that's a solution that would help me. But as this seems to be a common thing, maybe that's fine?

There are some Kubernetes proposals on improving this whole situation, such as this one: https://github.com/kubernetes/enhancements/pull/4213 But that might take a while more until the proposal is accepted, then about a year before we start seeing it in stable releases, and then wait a bit more because most Kubernetes providers intentionally stays a minor version or more behind latest for stability reasons. On top of this, the proposal might also be rejected, and we have to start all over again with a new proposal, which takes a lot of work.

Which more favor into my argument that a "hardcoded fix"/"special behavior for deployments" might be sufficient for now. Wdyt?

Are you up for the idea? And if so, would you accept a PR on it?