DaemonSets not handled properly in clusters >=1.16

JacobHenner commented 4 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

A little bit of both.

What happened:

Upon instance termination, lifecycle-manager attempted to evict all pods on the node, including DaemonSets. Afterwards, the DaemonSet pods are rescheduled. After some period of time, the node drain operation times out, since the newly-created DaemonSet pods do not go away.

What you expected to happen:

Either the DaemonSet pods would be evicted and blocked from being rescheduled, or they'd simply be ignored.

How to reproduce it (as minimally and precisely as possible):

Run nodes in a cluster with DaemonSets, and trigger lifecycle-manager for the node.

Anything else we need to know?:

I'd be willing to attempt to submit a PR for this. I have two possible approaches in mind:

Ignore DaemonSet pods - don't evict these pods if they exist, and proceed with the node termination if they're the only pods remaining.
Add an additional taint to nodes which are being drained. node.kubernetes.io/unschedulable is already added, but the DaemonSet controller adds a node.kubernetes.io/unschedulable:NoSchedule toleration to all DaemonSet pods.

I would imagine #2 is preferred, since it'd actually stop the DaemonSet pods.

Environment:

EKS 1.16

Kubernetes version:

1.16

eytan-avisror commented 4 years ago

Hey @JacobHenner

Thanks for filing this. When lifecycle-manager drains the node it simply runs a kubectl drain command: https://github.com/keikoproj/lifecycle-manager/blob/48c3bc778f5a04df01b792994ae1406b371e8e29/pkg/service/nodes.go#L53

Notice the "--ignore-daemonsets=true" flag. this means the drain command doesnt try to evict the daemonset pods.

In addition, the drain command should already be adding the taint:

- effect: NoSchedule
  key: node.kubernetes.io/unschedulable

Can you provide the logs from lifecycle-manager pod? maybe we can have a better understanding of what is causing the behavior you see

JacobHenner commented 4 years ago

My apologies, I should have done some more research about how the drain was implemented (I assumed it was implemented in code rather than a call to kubectl).

Anyway, it seems like this issue is caused by the version of kubectl used. In the Dockerfile, kubectl 1.12 is specified. When kubectl 1.12 conducts a drain, it receives a 404 when it attempts to GET a DaemonSet associated with a pod. This is due to k8s 1.16 deprecated api removals. Specifically:

DaemonSet in the extensions/v1beta1 and apps/v1beta2 API versions is no longer served.

Unfortunately, kubectl 1.12 attempts to GET using one of these removed API versions.

It looks like the solution is to upgrade the bundled kubectl. I substituted 1.12 with a newer version, and things seem to work as expected now.

eytan-avisror commented 4 years ago

Great catch @JacobHenner ! Do you mind sending a PR for updating it to a newer version?

https://github.com/keikoproj/lifecycle-manager/blob/98886453b19fb04837c6db5d4243179bacd2195d/Dockerfile#L10

Happy to do a hotfix release with this change.

P.S. The reason this is using the kubectl binary is that client-go does not facilitate/make available a method for draining (we would have to re-implement), and the existing logic is encapsulated in kubectl - last I read there were plans to make the kubectl client logic available as a library, but not sure if this is available yet. Would be happy to move away from forking this binary, this can become an issue of there are many concurrent terminations and we need to fork exec dozens of kubectl processes to do a drain.

JacobHenner commented 4 years ago

Sure, I can submit a PR. Do you have a version preference? The official kubectl/cluster compatibility is +- 1 minor version, but I imagine users of lifecycle-manager are using different cluster versions. Latest is 1.19.0 - I could submit that, or 1.16.* to fix this specific issue.

And indeed, I re-implemented a subset of drain in Python for this reason too. It'd be great if it was part of the official libraries, or perhaps even the control plane API itself.

eytan-avisror commented 4 years ago

I think 1.16.x latest should be good for now, thanks! 👍

keikoproj / lifecycle-manager

DaemonSets not handled properly in clusters >=1.16 #59