Add option to use "cordon" instead of "drain" for k3s upgrades

kube-hetzner / terraform-hcloud-kube-hetzner

Optimized and Maintenance-free Kubernetes on Hetzner Cloud in one command!

MIT License

2.41k stars 372 forks source link

Add option to use "cordon" instead of "drain" for k3s upgrades #1438

Closed jr-dimedis closed 3 months ago

jr-dimedis commented 3 months ago

This PR adds a new variable system_upgrade_use_drain (default true), which represents the current default behaviour to advise system-upgrade-controller to drain nodes before performing a k3s upgrade.

When set to false the agent plan will be configured to cordon nodes during upgrades, which prevents that new pods are scheduled but keeps all existing pods running on that node.

This may be useful if you have pods which are known to start slowly, e.g. because they have to mount volumes with many files which require to get the right security context applied.

Nothing has changed for the server plan, which uses cordon anyway.

jr-dimedis commented 3 months ago

@mysticaltech Sorry, I messed up this PR a bit. It's actually one commit, but I pushed it on my master by accident, then reverted it but created the branch again on my new master, which of course had the original and the reverted commit (very clever of mine...). Feel free to squash this on merge (if you accept my change).

jhass commented 3 months ago

How do you apply this to an existing cluster? Looks like there's a missing trigger on the kustomization resource?

mysticaltech commented 3 months ago

@jhass Yes, it's probably missing triggers. However to apply, just kubectl get the system upgrade controller plans, use kubectl get crds | grep plan to find the exact name. And output it to yaml, edit it the same way that this PR does in the diff and reapply. If you have access to chatgpt or claude.ai it can help you do this in less than five minutes.

jhass commented 3 months ago

I'm aware I can do it manually. That's however not the point of using Terraform to me :)

jr-dimedis commented 3 months ago

@jhass

I am with you that it would be great, if such changes would apply automatically to existing clusters. But I have to admit, that I do not have a clue how to accomplish that. The way manifests are applied (by copying them to a CP node and import them there) is not really straight forward for me ;). Although I probably understand why this is necessary: overcoming the chicken / egg problem of kubernetes provider configuration, when creating a new cluster.

@mysticaltech

Do you have any advice for me how to add such a trigger? I'll be happy to provide a PR for that. BTW: thanks for merging this one :)

jhass commented 3 months ago

Probably this resource needs to be split into the steps that are really for init-only and the ones that can be re-applied. Then the triggers map for the update-able one need to either include all the template parameters, or using terraform_data instead of the deprecated null_resource, extracting the rendered template into a local and passing it through its input argument into the provisioner (via self).