kontena / pharos-host-upgrades

Kube DaemonSet for host OS upgrades
Apache License 2.0
41 stars 1 forks source link

Drain fails for pods with local storage #26

Closed SpComb closed 6 years ago

SpComb commented 6 years ago

Missing kubectl drain --delete-local-data, assuming that's what we want?

2018/06/18 12:33:05 Reboot required, draining kube node...
2018/06/18 12:33:05 Draining kube node terom-pharos-worker1 (with annotation pharos-host-upgrades.kontena.io/drain)...
2018/06/18 12:33:05 kubectl drain --ignore-daemonsets --force terom-pharos-worker1...
2018/06/18 12:33:05 Upgrade failed, releasing kube lock... (Failed to drain kube node for host reboot: Failed to drain node terom-pharos-worker1: kubectl drain --ignore-daemonsets --force terom-pharos-worker1: exit status 1: error: unable to drain node "terom-pharos-worker1", aborting command...

There are pending nodes to be drained:
 terom-pharos-worker1
error: pods with local storage (use --delete-local-data to override): kubernetes-dashboard-598d75cb96-2nnql
)
2018/06/18 12:33:05 Releasing kube lock...
2018/06/18 12:33:05 kube/lock kube-system/daemonsets/host-upgrades: get
2018/06/18 12:33:05 kube/lock kube-system/daemonsets/host-upgrades: release
2018/06/18 12:33:05 kube/lock kube-system/daemonsets/host-upgrades: clear pharos-host-upgrades.kontena.io/lock=terom-pharos-worker1
2018/06/18 12:33:05 kube/lock kube-system/daemonsets/host-upgrades: update
2018/06/18 12:33:05 Failed to drain kube node for host reboot: Failed to drain node terom-pharos-worker1: kubectl drain --ignore-daemonsets --force terom-pharos-worker1: exit status 1: error: unable to drain node "terom-pharos-worker1", aborting command...

There are pending nodes to be drained:
 terom-pharos-worker1
error: pods with local storage (use --delete-local-data to override): kubernetes-dashboard-598d75cb96-2nnql

This releases the kube lock and crashes the pod without rebooting the host.

SpComb commented 6 years ago

The only thing that --delete-local-data affects is emptyDir volumes: https://github.com/kubernetes/kubernetes/blob/v1.10.2/pkg/kubectl/cmd/drain.go#L413

The flag doesn't seem very useful if it aborts the entire node drain because of a single pod with emptyDir volume at /tmp... I'd imagine that it might make sense to leave the pods with emptyDir volumes terminated but still scheduled on the same node for the duration of the reboot, but that isn't an option either.