k8s-spot-rescheduler doesn't trigger cluster autoscale up if no spot instances available

morganwalker commented 5 years ago

We're using kops 1.10.0 and k8s 1.10.11. We're using two separate instance groups (IG), nodes (on-demand) and spots (spot), both spread across 3 availability zones. I've applied the appropriate nodeLabels and have defined the following in my k8s-spot-rescheduler deployment manifest:

- --on-demand-node-label=on-demand
- --spot-node-label=spot

The nodes IG has the spot=false:PreferNoSchedule taint so the spots IG is preferred. I'm using the cluster autoscaler to autodiscover both IGs via the --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/kubernetes.metis.wtf and these tags exist on both IGs. I've confirmed that pods on most nodes nodes are able to be drained and moved to spots nodes. With an exception:

The spots IG was set to minSize: 1 and maxSize=3 and we had one spots node up and running in us-east-1c

k8s-spot-rescheduler attempted to drain the pods on a nodes node but failed with

I0117 02:16:49.099271       1 rescheduler.go:288] Considering ip-172-20-127-232.ec2.internal for removal
I0117 02:16:49.099797       1 rescheduler.go:293] Cannot drain node: pod metis-internal/rabbitmq-0 can't be rescheduled on any existing spot node

metis-internal/rabbitmq-0 is a statefulSet with a PVC
the PVC resides in us-east-1a so it makes sense why it couldn't be scheduled on the spots node

Why didn't the failure to schedule metis-internal/rabbitmq-0 trigger the cluster autoscaler to try to provision a new spots node until it created one in the same availability zone? I'm wondering if k8s-spot-rescheduler would have actually evicted the pod, the cluster autoscaler would have noticed that a pod needed to be scheduled and would have spun up a new node in the spots IG.

obellagamba commented 5 years ago

Any news on this front?

If you guys have a strategy, I'm more than willing to help with the implementation of this feature, as it seems important for us.

Antony450 commented 4 years ago

Taint can be add to on-demand instance group other than spot-instance IG like below. labels = "kubernetes.io/role=common,lifecycle=OnDemand" taints = "lifecycle=OnDemand:PreferNoSchedule" This works for me.

CharlieC3 commented 4 years ago

In my experience the taint just tells the K8s scheduler to try scheduling any unscheduled pods onto an existing spot instance node, and it doesn't tell the cluster autoscaler to scale up on spot instances to make room if there aren't any spot instances available.

yogeshkk commented 4 years ago

Hi,

I having the same issue so I was thinking creating automation which will see if there is an on-demand node is up in the environment and if yes I will add a few spot node so k8s-spot-rescheduler can move the pod to this spot and we will get rid of the on-demand node.

We can implement similer in k8s-spot-rescheduler. Was thinking we can have a parameter which will take the name of spot IG or ASG and if we don't have spot capacity we will scale that IG or ASG(can use CA's code for scaling).

pusher / k8s-spot-rescheduler

k8s-spot-rescheduler doesn't trigger cluster autoscale up if no spot instances available #53