Autoscaling node pools getting ignored

lloesche commented 11 months ago

I'm running into an issue where if I enable autoscaling on one of the worker pools, that pool is being ignored completely by hetzner-k3s.

Using the following configuration file, if I remove the last four lines, the jobs pool is being created. With those autoscaling lines in the config however, the nodes aren't being created and the label/taint code throws errors.

This is my config:

hetzner_token: xxxxxxx
cluster_name: fixsaas
kubeconfig_path: /Users/lukas/fixstrap/eu/conf/hetzner-k3s.kubeconfig
k3s_version: v1.27.4+k3s1
public_ssh_key_path: /Users/lukas/fixstrap/eu/conf/id_ecdsa.pub
private_ssh_key_path: /Users/lukas/fixstrap/eu/conf/id_ecdsa
use_ssh_agent: false
ssh_allowed_networks:
- 0.0.0.0/0
api_allowed_networks:
- 0.0.0.0/0
private_network_subnet: 10.0.0.0/16
schedule_workloads_on_masters: false
additional_packages:
- unattended-upgrades
- update-notifier-common
post_create_commands:
- systemctl enable unattended-upgrades
- systemctl start unattended-upgrades
- apt-get update
- apt-get upgrade -y
- apt-get autoremove -y
- apt-get autoclean -y
masters_pool:
  instance_type: cx21
  instance_count: 3
  location: nbg1
worker_node_pools:
- name: workers
  instance_type: cpx51
  instance_count: 3
  location: nbg1
  labels:
  - key: node-role.fixcloud
    value: worker
- name: db
  instance_type: cpx51
  instance_count: 3
  location: nbg1
  labels:
  - key: node-role.fixcloud
    value: database
  taints:
  - key: node-role.fixcloud/dedicated
    value: database:NoSchedule
- name: jobs
  instance_type: ccx23
  instance_count: 1
  location: nbg1
  labels:
  - key: node-role.fixcloud
    value: jobs
  taints:
  - key: node-role.fixcloud/dedicated
    value: jobs:NoSchedule
  autoscaling:
    enabled: true
    min_instances: 1
    max_instances: 4

The log initially doesn't show any errors during node creating, the jobs servers are just being skipped as if they didn't exist in the config. Later in the log where it tries to set the labels and taints it does show some errors:

=== Deploying Hetzner drivers ===

Adding labels to workers...
node/fixsaas-cpx51-pool-workers-worker1 labeled
node/fixsaas-cpx51-pool-workers-worker2 labeled
node/fixsaas-cpx51-pool-workers-worker3 labeled
...done.

Adding labels to workers...
node/fixsaas-cpx51-pool-db-worker1 labeled
node/fixsaas-cpx51-pool-db-worker2 labeled
node/fixsaas-cpx51-pool-db-worker3 labeled
...done.

Adding taints to workers...
node/fixsaas-cpx51-pool-db-worker1 modified
node/fixsaas-cpx51-pool-db-worker2 modified
node/fixsaas-cpx51-pool-db-worker3 modified
...done.

Adding labels to workers...
error: resource(s) were provided, but no name was specified
...done.

Adding taints to workers...
error: at least one resource name must be specified since 'all' parameter is not set
...done.

Creating secret for Hetzner Cloud token...

if I delete the last four lines from the config, I get the jobs worker pool, but then have to manually scale it.

Is there anything obvious that I'm doing wrong?

vitobotta commented 11 months ago

You are not supposed to change the autoscaling settings of an existing node pool as this can only lead to issues IMO. Labels and taints are also not supported for autoscaled nodes; it's a limitation in the cluster autoscaler for Hetzner unfortunately and there isn't much I can do about it.

If autoscaling is enabled for a node pool correctly (and you haven't changed those settings afterwards) then the autoscaler should create nodes as soon as pods are pending for a certain amount of time because of lack of resources. If this doesn't happen please share more details.

Also if the node count in your account reaches the limit imposed by Hetzner to your account - which is low for new accounts - new nodes won't be created and you need to request a limit increase.

lloesche commented 11 months ago

You are not supposed to change the autoscaling settings of an existing node pool as this can only lead to issues IMO

I'm not changing existing settings. I'm bootstrapping a completely new environment inside a fresh Hetzner Cloud project. My account has a couple thousand cores and servers of unused quota. I'm starting hetzner-k3s with the above settings (plus token of course) in an empty project.

vitobotta commented 11 months ago

I thought that you added/removed the lines from the config file and rerun the create command. When you expect new nodes to be created by the autoscaler, do you see pods in pending state? Also what do you see in the autoscaler's logs?

lloesche commented 11 months ago

Yup, when I deploy a hello-world test app with two replicas, that targets the jobs node pool, I see both of them pending:

minimi:conf lukas$ kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
hello-world-5bcdcc95ff-gdxtb   0/1     Pending   0          3m34s
hello-world-5bcdcc95ff-m9jst   0/1     Pending   0          3m34s

The log shows the jobs pool as configured, but no nodes are ever created:

I0907 13:30:25.719477       1 node_instances_cache.go:156] Start refreshing cloud provider node instances cache
I0907 13:30:25.719558       1 node_instances_cache.go:168] Refresh cloud provider node instances cache finished, refresh took 33.287µs
I0907 13:30:26.215221       1 hetzner_node_group.go:438] Set node group draining-node-pool size from 0 to 0, expected delta 0
I0907 13:30:26.215273       1 hetzner_node_group.go:438] Set node group jobs size from 0 to 0, expected delta 0
W0907 13:30:36.231085       1 hetzner_servers_cache.go:94] Fetching servers from Hetzner API
I0907 13:30:36.660983       1 hetzner_node_group.go:438] Set node group draining-node-pool size from 0 to 0, expected delta 0
I0907 13:30:36.661016       1 hetzner_node_group.go:438] Set node group jobs size from 0 to 0, expected delta 0
I0907 13:30:46.708906       1 hetzner_node_group.go:438] Set node group draining-node-pool size from 0 to 0, expected delta 0
I0907 13:30:46.708939       1 hetzner_node_group.go:438] Set node group jobs size from 0 to 0, expected delta 0

Although I specified the minimum number of nodes as 1 so I would expect the scaler to always have a minimum of one node available, shouldn't it?

I was also wondering, there's instance_count as well as autoscaling.min_instances. How do the two interact with each other when autoscaling is enabled? I just set both to the same value (1).

By the way, I have another hetzner-k3s cluster, where autoscaling works flawlessly, but there it's the only node pool there is. So maybe that's playing a role here?

vitobotta commented 11 months ago

Interesting, thanks for sharing more details. Can you try this:

remove / comment out the config section for the autoscaled node pool
rerun the create command so it will remove the node pool from the autoscaler
add configuration for a new autoscaled node pool with different name
rerun create command to add the new node pool to the autoscaler
drain and the old nodes (of the node pool you removed) so that those pods get rescheduled

So to see if there is something wrong with that node pool for some reason. With the steps above it should in theory move the pods to new nodes from the new autoscaled pool.

I have been using autoscaling with 2 clusters since I added it and haven't come across this issue so far so not sure what's happening, also because I don't have much control on the cluster autoscaler itself. I should get more familiar with its own codebase.

BTW others have reported the issue that minimum number of nodes is not respected by the autoscaler and I noticed this myself too. It only creates nodes when there are pending jobs. I will see if I can report it to their github issues.

lloesche commented 11 months ago

I think I found the problem. My taints and labels are missing on the autoscaled nodes. But let's go through what I tried step by step:

So, first I did steps 1 and 2 (of your list above) but the autoscaler config did not change. I don't think hetzner-k3s knows about the absence of the former jobs autoscaling pool. I created the entire cluster 3h ago and just removed the autoscaling pool named jobs from the config, then re-ran the create command. There were no errors but since the config no longer contains any autoscaling node pools it seems it also didn't touch the existing autoscaler configuration of the cluster. When I check, the pod has been running since the cluster was created 3h ago:

minimi:conf lukas$ kubectl -n kube-system get pods
NAME                                              READY   STATUS    RESTARTS   AGE
cluster-autoscaler-779b45d79b-ltjpb               1/1     Running   0          3h6m

When I look at the spec, I see that the autoscaler is still running with the no longer existing jobs node pool from the original configuration:

minimi:conf lukas$ kubectl -n kube-system get pods/cluster-autoscaler-779b45d79b-ltjpb -o yaml
...
  containers:
  - command:
    - ./cluster-autoscaler
    - --cloud-provider=hetzner
    - --nodes=1:4:CCX23:NBG1:jobs
    env:
...

Next I added the following section to the worker_node_pools config:

- name: testautoscale
  instance_type: cpx51
  instance_count: 2
  location: nbg1
  autoscaling:
    enabled: true
    min_instances: 2
    max_instances: 6

For testing I explicitly left out any labels and taints.

I re-ran the create command (steps 3 and 4). This time the autoscaler was restarted with a new configuration:

...
  - command:
    - ./cluster-autoscaler
    - --cloud-provider=hetzner
    - --nodes=2:6:CPX51:NBG1:testautoscale
...

I then created an app with 200 replicas requesting lots of CPU and memory. Now when I check the autoscaler log I see the nodes scaling up:

W0908 12:22:51.645712       1 hetzner_servers_cache.go:94] Fetching servers from Hetzner API
I0908 12:22:52.548039       1 hetzner_node_group.go:438] Set node group draining-node-pool size from 0 to 0, expected delta 0
I0908 12:22:52.548105       1 hetzner_node_group.go:438] Set node group testautoscale size from 6 to 6, expected delta 0

and after a while the nodes show up on the cluster:

minimi:conf lukas$ kubectl get nodes
NAME                                 STATUS   ROLES                       AGE     VERSION
...
...
testautoscale-1d4d2be64ed98e0f       Ready    <none>                      60s     v1.27.4+k3s1
testautoscale-6e27225c6794cbc        Ready    <none>                      58s     v1.27.4+k3s1
testautoscale-74afc208a0ac1074       Ready    <none>                      59s     v1.27.4+k3s1
testautoscale-f5742f012c8d4bd        Ready    <none>                      54s     v1.27.4+k3s1

So I think the reason it didn't work before was because the application I was deploying was targeting a label node-role.fixcloud.io=jobs and the labels (and taints) I have in my hetzner-k3s node pool config aren't applied to autoscaling nodes and likely the autoscaler doesn't "react" to resources pending with those labels and taints. Could that be?

My use case is, I have short (~2h) running CPU intensive jobs that run once per day. When they run I would like for the system to add additional nodes and scale them back down when the jobs have finished running after 2h or so. So I'd like for the autoscaler to only add instances, for those jobs, but not any other pods that might get scheduled. My idea was to use labels and taints to achieve this.

vitobotta commented 11 months ago

Thanks a lot for reporting back in detail! Seems that I need to fix a bug where rerunning the create command doesn't update the autoscaler config when all autoscaled pools are removed from the config. Also I knew that taints and tolerations aren't supported by the autoscaler yet bud didn't remember. Sorry. Glad you got it sorted though :)

lloesche commented 11 months ago

If combining labels/taints with autoscaling is not a valid configuration, maybe the config validation should abort with an error if someone tries to combine the two. Just to make it very explicit that the given configuration is invalid. WDYT?

vitobotta commented 11 months ago

Yep, good idea.

vitobotta commented 2 weeks ago

Adding labels and taints to autoscaled nodes is now supported by the autoscaler, so I am scheduling this for v2.0.1 since v2 is going to be released probably next weekend already.

vitobotta commented 2 weeks ago

Closing in favor of https://github.com/vitobotta/hetzner-k3s/issues/317 since the discussion moved on labels and taints.

vitobotta / hetzner-k3s

Autoscaling node pools getting ignored #269