kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
111.5k stars 39.77k forks source link

Mark node to be decommissioned and act accordingly #3885

Closed pires closed 7 years ago

pires commented 9 years ago

I haven't found a way of pausing/decommissioning a node, have all its containers stopped and recreated elsewhere in the cluster.

This would be great for node upgrades (hardware, OS, etc.).

Obviously, the node would have to be blacklisted so that no new containers are scheduled to it.

/cc @jmreicha

davidopp commented 8 years ago

Your analogy is reasonable, that scheduling node drains is similar to scheduling a workflow of run-to-completion Jobs. But I don't think the Job abstraction can be directly used for the former. More generally, I'm not sure how much sophistication for scheduling maintenance workflows we want to build into core Kubernetes, vs. suggest people build it on top. My initial thought is that we want to support simple server-side drains (something that marches through the cell at a specified rate and respecting disruption budgets) but complicated maintennce workflow scheduling shouldn't be part of core Kubernetes.

0xmichalis commented 7 years ago

Is this covered by kubectl drain?

paralin commented 7 years ago

Where are we on Docker checkpoint and restore? It would be super ideal if we could combine the auto drain with a pod migration procedure.

davidopp commented 7 years ago

kubectl drain does a lot of this, but there are some ideas in this issue that were not really ever implemented and we might want to refer back to.

mml commented 7 years ago

I am closing this as fixed. We can easily search for and refer to this issue for ideas later.