planetlabs / draino

Automatically cordon and drain Kubernetes nodes based on node conditions
Apache License 2.0
622 stars 83 forks source link

Question about "timed out waiting for evictions to complete: timed out" #57

Open vukor opened 4 years ago

vukor commented 4 years ago

Periodically got in logs:

INFO kubernetes/eventhandler.go:155 Failed to drain {"node": "ip-x-x-x-x.ec2.internal", "error": "timed out waiting for evictions to complete: timed out", "errorVerbose": "timed out\ntimed out waiting for evictions to complete\ngithub.com/planetlabs/draino/internal/kubernetes.(*APICordonDrainer).Drain\n\t/go/src/github.com/planetlabs/draino/internal/kubernetes/drainer.go:189\ngithub.com/planetlabs/draino/internal/kubernetes.(*DrainingResourceEventHandler).cordonAndDrain.func1\n\t/go/src/github.com/planetlabs/draino/internal/kubernetes/eventhandler.go:154\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"}

Looks like after timeout failed draino ignore the node, but I'm not sure. As I understand draino is stateless app, so will draino try again to drain this node?

ch-andremercer commented 4 years ago

^ +1

tarunptala commented 3 years ago

any solution you got maybe from configuration side? @vukor

yogeek commented 3 years ago

Same error here...did you get any solution @vukor please ?

vukor commented 3 years ago

Same error here...did you get any solution @vukor please ?

nope, periodically see error messages in planetlabs/draino:b788331

willshu commented 3 years ago

adding these extraArgs seems to have helped:

extraArgs:
  - evict-daemonset-pods
  - evict-emptydir-pods
  - evict-unreplicated-pods

I had to make some adjustment to the deployment template to get these parsed correctly by helm

          {{- range $key, $value := .Values.extraArgs }}
            - {{ if $value }}--{{ $value }}{{ end }}
          {{- end }}