Failed schedule deletion seen in draino logs

tarunptala commented 3 years ago

Info

Kops Cluster with version 1.16+ running on AWS.
planetlabs/draino:e0d5277 image is being used.

We see constant msgs in draino logs

2021-02-19T21:34:50.669Z    ERROR   kubernetes/drainSchedule.go:68  Failed schedule deletion    {"key": "ip-10-53-32-128.us-west-2.compute.internal"}
github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).DeleteSchedule
    /go/src/github.com/planetlabs/draino/internal/kubernetes/drainSchedule.go:68
github.com/planetlabs/draino/internal/kubernetes.(*DrainingResourceEventHandler).OnDelete
    /go/src/github.com/planetlabs/draino/internal/kubernetes/eventhandler.go:152
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnDelete
    /go/pkg/mod/k8s.io/client-go@v0.0.0-20190819141724-e14f31a72a77/tools/cache/controller.go:251
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1
    /go/pkg/mod/k8s.io/client-go@v0.0.0-20190819141724-e14f31a72a77/tools/cache/shared_informer.go:609
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:284
k8s.io/client-go/tools/cache.(*processorListener).run.func1
    /go/pkg/mod/k8s.io/client-go@v0.0.0-20190819141724-e14f31a72a77/tools/cache/shared_informer.go:601
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*processorListener).run
    /go/pkg/mod/k8s.io/client-go@v0.0.0-20190819141724-e14f31a72a77/tools/cache/shared_informer.go:599
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:71
2021-02-23T11:14:33.146Z    ERROR   kubernetes/drainSchedule.go:68  Failed schedule deletion    {"key": "ip-10-53-31-9.us-west-2.compute.internal"}
github.com/planetlabs/draino/internal/kubernetes.(*DrainSchedules).DeleteSchedule
    /go/src/github.com/planetlabs/draino/internal/kubernetes/drainSchedule.go:68
github.com/planetlabs/draino/internal/kubernetes.(*DrainingResourceEventHandler).OnDelete
    /go/src/github.com/planetlabs/draino/internal/kubernetes/eventhandler.go:152
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnDelete
    /go/pkg/mod/k8s.io/client-go@v0.0.0-20190819141724-e14f31a72a77/tools/cache/controller.go:251
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1
    /go/pkg/mod/k8s.io/client-go@v0.0.0-20190819141724-e14f31a72a77/tools/cache/shared_informer.go:609
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:284
k8s.io/client-go/tools/cache.(*processorListener).run.func1
    /go/pkg/mod/k8s.io/client-go@v0.0.0-20190819141724-e14f31a72a77/tools/cache/shared_informer.go:601
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*processorListener).run
    /go/pkg/mod/k8s.io/client-go@v0.0.0-20190819141724-e14f31a72a77/tools/cache/shared_informer.go:599
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1
    /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190817020851-f2f3a405f61d/pkg/util/wait/wait.go:71
2021-02-23T11:15:47.174Z    ERROR   kubernetes/drainSchedule.go:68  Failed schedule deletion    {"key": "ip-10-53-32-225.us-west-2.compute.internal"}

Not sure if i am using correct image of draino.

Note: I have deployed node-problem-detector and cluster-autoscaler alongside with it already.

jrivers96 commented 3 years ago

I just had some luck by changing the RBAC

apiGroups: [''] resources: [nodes/status] verbs: [update, patch]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels: {component: draino}
  name: draino
rules:
- apiGroups: [apps]
  resources: [statefulsets]
  verbs: [create, update, get, watch, list]
- apiGroups: ['']
  resources: [endpoints]
  verbs: [create, update, get, watch, list]
- apiGroups: ['']
  resources: [events]
  verbs: [create, patch, update]
- apiGroups: ['']
  resources: [nodes]
  verbs: [get, watch, list, update]
- apiGroups: ['']
   resources: [nodes/status]
  verbs: [update, patch]
- apiGroups: ['']
  resources: [pods]
  verbs: [get, watch, list]
- apiGroups: ['']
  resources: [pods/eviction]
  verbs: [create]
- apiGroups: [extensions]
  resources: [daemonsets]
  verbs: [get, watch, list]

pschulten commented 3 years ago

I have the same errors, granting update to nodes/status did not help. This happens for me when the cluster-autoscaler terminates an instance.

kubernetes: v1.20.5 draino: planetlabs/draino:e0d5277 cluster-autoscaler: v1.20.0

Daniel-Ebert commented 3 years ago

Same error here, will try the suggetion with the RBAC permission!

planetlabs / draino

Failed schedule deletion seen in draino logs #114