kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
7.94k stars 3.93k forks source link

Hetzner server type availability check doesn't work #7088

Closed karsten42 closed 1 month ago

karsten42 commented 1 month ago

Which component are you using?: cluster-autoscaler

What version of the component are you using?:

Component version: v1.30.1

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.6+k3s2

What environment is this in?: Hetzner

What did you expect to happen?: That the autoscaler selects a node group for which the set instance type is available.

What happened instead?: A node group is selected for which the instance type isn't available in the specified region. This error is returned when trying to create a new node and one is stuck in an error loop until the instance type becomes available again.

Anything else we need to know?: The code that checks if an instance type is available doesn't actually check for availability. It seems that prices for instance types are always available even if the instances themselves aren't.
Maybe this was different in the past? I guess the datacenter resource should be used to determine type availability in a specific region.

Shubham82 commented 1 month ago

/area cluster-autoscaler /area provider/hetzner

Shubham82 commented 1 month ago

cc @apricote PTAL!

apricote commented 1 month ago

The current check is the best one that is reliable. It checks if the server type is at all being offered in the selection location.

Actual availability can change at any moment, even between checking the availability and then creating the server.

If I understood cluster-autoscaler correctly, it should gracefully handle an error from scaling up a node pool and try other available pools. There was a bug in the hetzner provider that caused issues with this behavior, which was reported in #6240 and fixed in last weeks patch releases.

Could you try updating to cluster-autoscaler v1.30.2 to see if the issue was actually fixed for you?

(Thanks for pinging me @Shubham82!)

karsten42 commented 1 month ago

Thanks for the detailed explanation! I will upgrade the cluster-autoscaler and try it out.

Shubham82 commented 1 month ago

Hi @karsten42, if your concern is resolved, so can we close this issue?

karsten42 commented 1 month ago

Yes, thanks.