k0sproject / k0smotron

k0smotron
https://docs.k0smotron.io/
Other
471 stars 45 forks source link

Decreasing number of replicas in K0sControlPlane is not working properly #459

Open nekwar opened 7 months ago

nekwar commented 7 months ago

Details

Problem summary

Downscaling controllers managed by K0sContolPlane is not working properly. Behaviour of deletion is quite unpredictable -- some times node is deleted on a Kubernetes level, some times it is not. But what is common between all deletion cases is that

Expected behaviour Controller node to be properly deleted (at least on Kubernetes level, I understand that etcd membership is another issue) by downscaling replicas in K0sControlPlane

nekwar commented 7 months ago

The question here is what is the proper process of node deletion?

IMO, theoretically, the most proper way would be to cordon/drain node first, then delete the node (similar to kubectl node delete), but I'm not sure if this can be implemented with k0smotron.

twz123 commented 7 months ago

node can't be manually deleted from etcd member list with k0s etcd leave <node-ip> due to etcd cluster "being unhealthy"

The right way to specify the peer address that should be removed is k0s etcd leave --peer-address <node-ip>. When passing <node-ip> as an argument instead as a flag, it will be simply ignored and k0s etcd leave will default to remove the current node from the cluster. I admit that this is very confusing, and it took me a while to realize it myself.

nekwar commented 7 months ago

@twz123 Thank you for PR!

Just a question - won't it make more sense to use same syntax for k0s etcd command as standard etcdctl? I think it will be the most user-friendly solution

twz123 commented 7 months ago

I've thought about that, but removing the --peer-address flag would have been a breaking change to the CLI interface that I didn't want to make.

makhov commented 6 months ago

@nekwar we have just released the new k0smotron v0.9.0 with a bunch of improvements and the downscaling should work properly