vitobotta / hetzner-k3s

The easiest and fastest way to create and manage Kubernetes clusters in Hetzner Cloud using the lightweight distribution k3s by Rancher.
MIT License
1.9k stars 141 forks source link

Warning from Hetzner about cluster-autoscaler #429

Closed emrys90 closed 2 months ago

emrys90 commented 2 months ago

I received an email warning from Hetzner with the following message:

Subject: Important client information: Upcoming changes to cluster-autoscaler Hetzner provider and CX11 server type removal Body: Based on our API monitoring, you are using Kubernetes cluster-autoscaler in your Hetzner Cloud projects. The Hetzner provider in current versions of cluster-autoscaler has a bug and relies on the CX11 server type, which we will remove from our ordering options on 6 September 2024. You can learn more about the removal in the Cloud Changelog: https://docs.hetzner.cloud/changelog#2024-06-06-old-server-types-with-shared-intel-vcpus-are-deprecated To prevent any disruptions for you, we will keep the CX11 server type available for your account. We will remove your access to the CX11 server type two weeks after the Kubernetes community releases new versions of cluster-autoscaler. We will announce the exact date in a follow up notification once the new versions are available. The following versions of cluster-autoscaler are affected: ≤1.28.6 (including 1.27 and older) ≤1.29.4 ≤1.30.2 ≤1.31.0 To bridge the gap until the Kubernetes community releases the new versions, we published alternative container images of cluster-autoscaler that include a patch for the bug. You can use these in your deployment, but we will remove them one month after new cluster-autoscaler versions become available. We will not provide any other patch releases on this container image repository. Please switch back to the official images as soon as possible. [docker.io/hetznercloud/cluster-autoscaler:v1.28.6-hcloud1](http://docker.io/hetznercloud/cluster-autoscaler:v1.28.6-hcloud1) [docker.io/hetznercloud/cluster-autoscaler:v1.29.4-hcloud1](http://docker.io/hetznercloud/cluster-autoscaler:v1.29.4-hcloud1) [docker.io/hetznercloud/cluster-autoscaler:v1.30.2-hcloud1](http://docker.io/hetznercloud/cluster-autoscaler:v1.30.2-hcloud1) [docker.io/hetznercloud/cluster-autoscaler:v1.31.0-hcloud1](http://docker.io/hetznercloud/cluster-autoscaler:v1.31.0-hcloud1) We will send you another notification once the new versions become available. You can find more information at the following links: https://docs.hetzner.cloud/changelog#2024-08-30-bug-cx11-removal-will-break-certain-versions-of-cluster-autoscaler https://github.com/kubernetes/autoscaler/issues/7210 https://docs.hetzner.cloud/changelog#2024-06-06-old-server-types-with-shared-intel-vcpus-are-deprecated We will be happy to help you with any questions. Please write us a support request by logging onto your account on https://console.hetzner.cloud/support Thank you for your understanding.

I am not using the CX11 node for my clusters. Is there anything I need to do for this? Am I at risk of having my production servers shutdown?

vitobotta commented 2 months ago

I also received it today. Not sure it's worth making a release to use the temp image since the problem only affects people who might try and use old instances that have been deprecated for a while already.

If you are not using CX11 you don't need to worry about this.

emrys90 commented 2 months ago

Okay thanks!

rksm commented 2 months ago

Sorry to jump in here. I have very limited knowledge of hetzner-k3s, but to me it seems that the cx11 instance is hard coded to serve as the "draining-node-pool" (as mentioned in the linked github issue). To me that seems that it'll affect users regardless whether they use that node type or not.

vitobotta commented 2 months ago

Uhm, seems like I had read it too quickly. I will make a release with their temp image for now then.

vitobotta commented 2 months ago

Can you guys please do a quick test of the autoscaler with 2.0.7?

apricote commented 2 months ago

Hello :wave:

if your account is currently using cluster-autoscaler we will set a flag to your account so cx11 stays available until proper releases of cluster-autoscaler are available. You will get another notification then about the timeline for removal of cx11 from your account and the removal of the temp images.

Am I at risk of having my production servers shutdown?

This is only about the autoscaler doing active work. We will not shut down any servers. Just the cluster-autoscaler will throw an error anytime it tries to scale up your cluster.

vitobotta commented 2 months ago

Hello 👋

if your account is currently using cluster-autoscaler we will set a flag to your account so cx11 stays available until proper releases of cluster-autoscaler are available. You will get another notification then about the timeline for removal of cx11 from your account and the removal of the temp images.

Am I at risk of having my production servers shutdown?

This is only about the autoscaler doing active work. We will not shut down any servers. Just the cluster-autoscaler will throw an error anytime it tries to scale up your cluster.

Thanks @apricote for the clarification! I made a release with your docker image anyway for now. Just in case someone doesn't upgrade to the new and fixed version once it's out.

vitobotta commented 2 months ago

Closing since this has been addressed.

emrys90 commented 2 months ago

Is it possible to update the autoscaler without updating k3s? I'd rather not risk my production system updating to something that may introduce issues, especially with how much changed in the 2.0 update.

vitobotta commented 2 months ago

Is it possible to update the autoscaler without updating k3s? I'd rather not risk my production system updating to something that may introduce issues, especially with how much changed in the 2.0 update.

I have it on my list to make it possible to set the docker image since we can now configure the URLs of the manifest but not the image. I will do it when I have a bit more time but the Hetzner image seems to work perfectly for me. I tested it a lot between yesterday and today and haven't seen any issues. If you upgrade just make sure you use the very latest version of hetzner-k3s since i fixed an issue with detection of the private network interface in autoscaled nodes.

emrys90 commented 2 months ago

Is it possible to update the autoscaler without updating k3s? I'd rather not risk my production system updating to something that may introduce issues, especially with how much changed in the 2.0 update.

I have it on my list to make it possible to set the docker image since we can now configure the URLs of the manifest but not the image. I will do it when I have a bit more time but the Hetzner image seems to work perfectly for me. I tested it a lot between yesterday and today and haven't seen any issues. If you upgrade just make sure you use the very latest version of hetzner-k3s since i fixed an issue with detection of the private network interface in autoscaled nodes.

I meant my concern is on updating hetzner-k3s. I'm on version 1.1.5, and version 2.0 has some involved steps necessary for updating. I am concerned about introducing issues with my production system.

vitobotta commented 2 months ago

If you follow the instructions correctly you should be fine. You could also replicate your current cluster as test cluster and upgrade that one first

emrys90 commented 2 months ago

If you follow the instructions correctly you should be fine. You could also replicate your current cluster as test cluster and upgrade that one first

There's a lot of steps, that I don't fully understand, that would be easy to screw something up. Even if I manage to do it right on the test cluster, a typo or something could mess up the production cluster when I do that next.

I would rather avoid that risk of messing up my production system...

vitobotta commented 2 months ago

If you follow the instructions correctly you should be fine. You could also replicate your current cluster as test cluster and upgrade that one first

There's a lot of steps, that I don't fully understand, that would be easy to screw something up. Even if I manage to do it right on the test cluster, a typo or something could mess up the production cluster when I do that next.

I would rather avoid that risk of messing up my production system...

Which steps do you find difficult? Most of it is about adapting the config file. There isn't much to it really

jampy commented 4 weeks ago

Is hetzner-k3s v2.0.8 using the recently released official patched autoscaler or the temporarily patched version from Hetzner, which they will remove in ~2 weeks?

vitobotta commented 4 weeks ago

Is hetzner-k3s v2.0.8 using the recently released official patched autoscaler or the temporarily patched version from Hetzner, which they will remove in ~2 weeks?

At the moment it uses the patched version from Hetzner, docker.io/hetznercloud/cluster-autoscaler:v1.31.0-hcloud1. I will try to switch to the latest version with next release and also make the image configurable if something similar happens in the future, so you can just customize it in the config file.

t33muki commented 1 week ago

Got a message today from Hetzner stating that we have less than two weeks left, before the patched images are no more.

"Please switch back to the official image repositories. We will remove the alternative images on 19 November 2024. You will be unable to pull the images after that date."

vitobotta commented 1 week ago

Got a message today from Hetzner stating that we have less than two weeks left, before the patched images are no more.

"Please switch back to the official image repositories. We will remove the alternative images on 19 November 2024. You will be unable to pull the images after that date."

I have released 2.0.9 which includes a PR from a contributor with the fix.