`feature`: Cluster auto-patching / rolling updates

gyptazy commented 2 months ago

General

IMHO, it would be great to have a feature that ensures that the cluster is always patched and up to date. For this approach, ProxLB could be extended by a new option auto_update as a bool (true/false). Activating this would do:

Check if updates are available
Check if updates require reboot
if no -> simply patch
if yes
Balance the CTs/VMs away from the current node to other ones
Patch the node
Reboot the node
Re-migrate workloads
Proceed with next node

Doing everything by the Proxmox API requires a patched API method which can be found here https://github.com/gyptazy/ProxLB/blob/feature/auto-node-upgrade/packaging/proxlb-additions/perl5/PVE/API2/Nodes.pm#L622-L656. This would be shipped as a proxlb-additions Debian package. But before proceeding with the implementations, it would be great to know if this is needed or most ones use dedicated patch mgmt tools.

mentalinc commented 2 months ago

Sounds like a great feature to have rolling updates applied to the cluster.

gyptazy commented 2 months ago

So, currently I made some drafts to get this integrated and it is at around 80% finished.

First, we need to ensure that we have the new upgrade option within the API available on our node(s). This will be done by https://github.com/gyptazy/ProxLB/blob/feature/auto-node-upgrade/packaging/proxlb-additions/perl5/PVE/API2/Nodes.pm#L622-L656 and provides a new package proxlb-additions.deb.

Installing this one (please do not do this on productions systems right now!!!), we can use the options in ProxLB.

The PR #48 adds the new options and functions to enable the patching and everything already works as expected. But - currently, all nodes would to this would could have insane side effects for the balancing and also all nodes would reboot. This means, something like a locking mechanism is required, to ensure that only 1 (or a possible config option to define an amount of nodes) may be patched at the same time. However, it must be avoided that nodes will reboot in parallel any everything is unavailable or nodes start to fence.

Saying this, my API integrations immediately pops up in my mind again, where nodes can define there current status and iterate over the cluster nodes and ask them on the ProxLB for the locking state.

gyptazy commented 2 months ago

The feature is now fully completed in the linked PR #48.

However, it requires the patched API by the package proxlb-addition-api.deb. This package replaces the Nodes.pm file with the patched content to have new and required functions available within the Proxmox upstream API. This package is currently just a quick test for the ProxLB development and should not be used on productions environments. It is only for testing and validating as a source in the project in /addition/proxlb-addition-api/ to verify that ProxLB itself is working.

gyptazy / ProxLB

`feature`: Cluster auto-patching / rolling updates #39

General