rancher / elemental-operator

The Elemental operator is responsible for managing the OS versions and maintaining a machine inventory to assist with edge or baremetal installations.
Apache License 2.0
40 stars 17 forks source link

Track upgrade status in UI #433

Open davidcassany opened 1 year ago

davidcassany commented 1 year ago

This card is about tracking the status of the upgrade group somewhere within the UI, which basically results in keep some sort of status of it in ManagedOSImage or any other related resource.

See https://github.com/rancher/elemental-operator/issues/364#issuecomment-1438705636 for some comments around this. Basically the issue is relating the system-upgrade-controller plans (in downstream cluster) up to fleet-bundle (in upstream cluster).

The proper solution would be finding a way to gather from the fleet bundle resource (in upstream cluster and related to a ManagedOSImage instance) all bundle deployments (one per cluster in downstream clusters), then from the bundle deployment find a way to gather and track the system-upgrade-controller plans. I believe the hardest part is achieving a meaningful status monitoring between fleet and system-upgrade-controller.

An alternative could be to code some naive logic on top of all this, like elemental-operator marks all nodes pending to be upgraded once an upgrade group is created. The problem of such an approach is that we would have to replicate fleet selector logic in elemental-operator.

A third idea could be tracking status on nodes individually, aka track the start and the ending of a particular node upgrade without a cluster vision. I guess this could be somehow coded as part of the upgrade plan by extending/reusing the elemental-register to annotate the machineinventory when the upgrade starts and similarly annotate when the upgrade is done (with success or failure).

fgiudici commented 1 year ago

Issue #434 could be part of the foundation to track the current provisioning state (OS, cluster association, etc.).

Martin-Weiss commented 7 months ago

+100 - managing OS versions / patchlevel is key in Elemental (at least from my point of view)

So - we need an overview "what is running where" and we need a rollout management and error handling for assiginging when should which server/node be upgraded / downgraded and we need to see the process of this.

The UI needs to reflect the progress and status and maybe even an estimation for when it might be completed. We also might need a "cancel" button.

This also needs some sort of support for disconnected scenarios / single node clusters scenarios.

And we need some sort of reporting "this node has these CVEs and needs upgrade".