rancher / elemental-operator

The Elemental operator is responsible for managing the OS versions and maintaining a machine inventory to assist with edge or baremetal installations.
Apache License 2.0
40 stars 18 forks source link

Upgrading nodes from Rancher #330

Closed davidcassany closed 1 year ago

davidcassany commented 1 year ago

Assume the upgrade procedure as defined in docs. So there are two ways:

  1. Manually set the image we want to upgrade to (osImage approach)
  2. Select the upgrade image from a channel

For the 1st option not much to say, it is a manual process and the administrator has full freedom to update whatever she/he wants to. I believe this option is clearly good to keep it, but should it be supported? IMHO we shouldn't and some sort of warning should be tracked in logs when used.

For the 2nd there are details to figure out

Last but not least, we should discuss the essential tests for an upgrade acceptance criteria on each release:

davidcassany commented 1 year ago

@agracey @rancher/elemental any input and thoughts about this topic would be appreciated.

kkaempf commented 1 year ago

If the osImage should be unsupported (I'd agree, but last word is with @agracey ), then it shouldn't be (prominently?) offered in the UI, imho. osImage and channel shouldn't be presented as options on equal level.

kkaempf commented 1 year ago

The channel json must be manually (well, I don't see an easy way to automate it) created since it's us who decide which version to support and which not.

kkaempf commented 1 year ago

Added

to the initial list

kkaempf commented 1 year ago

:+1: on not supporting :latest release

kkaempf commented 1 year ago

:+1: on not supporting automatic upgrades. (Might be revisited based on market requirements).

kkaempf commented 1 year ago

How to notify new upgrades are available?

timestamp based ? That's already how the updater decides if an image is newer.

agracey commented 1 year ago
  • Do we need to set version constraints? Or fully relay on channel consistency? (all versions there are supported and exchangeable) I'd vote to assume all versions in a channel are exchangeable for now.

I would expect your assumption to be true for now. I can't imagine having wholly different OSes being distinguished by tags (like some images do with :alpine and :ubuntu)

  • How to notify new upgrades are available?

I would think a job polling for new tags would be sufficient? I don't think this would generate noticibly more traffic than most CI systems already do?

  • How to inform about changes (aka release notes) in a new image? (security/bugfix/feature release; fixed CVEs, bugs, etc.; added features)

I would love to see a pattern where release notes and closed CVEs are listed in the image annotations. This would mean that an admin would be able to see the diff between versions and decide when to upgrade (reducing downtime without increasing risk). I don't know how much work that would be to build though.

  • Do we support mutable image references tracking? The obvious use case would be pulling images as rancher/elemental-teal:latest and assume elemental operator is capable to keep upgrading on each new latest release. IMHO we should not support this use case, tends to be confusing.

Agreed, listing image tags and correlating the hash and build timestamp is likely enough?

  • Together with the above, is there a way to set automatic upgrades? So the cluster upgrades as soon as a greater image in the channel is available. Do we want to support that? I'd say not for now.

IMO, this could be left to a higher level automation. We just need to make sure the API is stable and fully featured to build against.

fgiudici commented 1 year ago

Yep, this is a discussion we really need!

  • How to deliver such a channel? I'd manually create a container including a json list of all supported versions, the question then is how to create and maintain it, it could easily be a manual job for now

so, the idea is to have a json file inside a container? ๐Ÿค” I would just keep a plain json file listing the images on a web URL. Having a container for that looks just extra overhead with not benefit. Wondering if I'm missing something.

  • Do we need to set version constraints? Or fully relay on channel consistency? (all versions there are supported and exchangeable) I'd vote to assume all versions in a channel are exchangeable for now.

๐Ÿ‘๐Ÿผ totally agree!

  • How to notify new upgrades are available?

I would stick with @agracey idea: pull the json from time to time. Overhead should be not noticeable.

  • How to inform about changes (aka release notes) in a new image? (security/bugfix/feature release; fixed CVEs, bugs, etc.; added features)

๐Ÿค” if we use a json, I would just add a reference to the official release notes (which I expect mainly be for the OS release) directly in the json. We can even think about having OS release notes and elemental ones (to separate OS changes an elemental proper ones) and add them only if/when needed.

  • Do we support mutable image references tracking? The obvious use case would be pulling images as rancher/elemental-teal:latest and assume elemental operator is capable to keep upgrading on each new latest release. IMHO we should not support this use case, tends to be confusing.

๐Ÿ‘๐Ÿผ yep, makes sense. Especially since the json will get updated with all the available versions.

  • Together with the above, is there a way to set automatic upgrades? So the cluster upgrades as soon as a greater image in the channel is available. Do we want to support that? I'd say not for now.

I am ok to not have it for now... but at some point this is something we should allow. Like the @agracey idea of leaving it to an higher level automation tool.

davidcassany commented 1 year ago

so, the idea is to have a json file inside a container? thinking I would just keep a plain json file listing the images on a web URL. Having a container for that looks just extra overhead with not benefit. Wondering if I'm missing something.

well the benefit or convenience of using a container is that we already have infrastructure and processes to actually deliver it, we can use the container registry. I'd also go to a web server, but I am clue less about how this should be handled form a maintenance point of view, this goes beyond the regular process of publishing RPM repositories or containers in a registry form OBS. In any case this is a tiny implementation detail, the relevant part is that we go for building and maintaining a list of available images in a json format compatible with elemental-operator.

How to notify new upgrades are available?

I would think a job polling for new tags would be sufficient? I don't think this would generate noticibly more traffic than most CI systems already do?

Sure the polling strategy is already in place, my question is more in the lines of should there be some logic somewhere to raise a notification somewhere (in the UI?) to make the admin aware new updates are available (imagine an important security fix)? I believe, for now, we can expect the admin to be proactive and manually check available updates time to time. But I am convinced we will need some sort of notification mechanism so admin can react on unexpected important updates (security fixes mostly).

How to inform about changes (aka release notes) in a new image? (security/bugfix/feature release; fixed CVEs, bugs, etc.; added features)

This is a though topic, I wonder whats being done for the BCI images on that regard. I'll try contact them to check if someone is already doing such a thing within the company for container images. OBS gives us *.packages list file containing a full list of all packages. This can be diffed across releases, however how to map that into actual bug fixes sounds complex. For that matter KIWI builds produce a *.changes file including all the change log of every single package, if diffed across releases one can parse bugzilla tickets... I wonder if something like this could be done for Docker builds. Feels there is a lot to explore in that area.

davidcassany commented 1 year ago

To sum up the discussion/comments, for now (short term), I believe we can state:

Action items:

@agracey @rancher/elemental if you are fine with it I am willing to close this card and create new ones for each action item.

davidcassany commented 1 year ago

Closing since follow up issue are created. They are linked and listed within the comment above.