rancher / system-upgrade-controller

In your Kubernetes, upgrading your nodes
Apache License 2.0
676 stars 83 forks source link

Support image digest in version #283

Closed keatontaylor closed 5 months ago

keatontaylor commented 5 months ago

Thanks for helping us to improve the system-upgrade-controller! We welcome all feature requests. Please fill out each area of the template so we can better understand the request. You can delete this message portion of the bug report.

Is your feature request related to a problem? Please describe. We currently a talos boot image with the same version that ultimately might have underlying extensions that have had versions bumped

For example:

Talos version 1.6.2 is the base image and the version of our custom image to match, but tailscale might have a new version that will be merged into our repo and ultimately built into the image. The version will stay the same 1.6.2 but may include tailscale 1.58.1 extension.

Describe the solution you'd like Ideally we'd like to use the digest of docker image in the version string to kickoff a plan. This would allow us to use renovate to update the digest via gitops when our image tag has a new version with updated extensions.

For example: version: v1.6.2@sha256:12345 will be updated via renovate bot to v1.6.2@sha256:56789

Describe alternatives you've considered We have considered version bumping the tags when extensions are updated, but the issue is that we do wish to keep the talos version matching so it is abundantly clear to understand what version of talos we're running. Ideally the digest would provide a cleaner method for us to still control updates via gitops while not adding additional burden of bifurcating from the upstream talos version numbers.

Additional context N/A

brandond commented 5 months ago

This would allow us to use renovate to update the digest via gitops when our image tag has a new version with updated extensions.

It sounds like you're asking for support for mutable tags, is that correct? You have a tag that you are periodically pushing to, and you want to be able to change the digest that the tag points to and have the SUC pick that up?

I don't see why that wouldn't work now; have you tried using it? Note that the docker image reference format does not support abbreviated digests, you'd need to specify the full thing, just as you would when specifying a digest in the tag portion of the image name in the image field of a pod spec - as that is how the controller uses the version field.

keatontaylor commented 5 months ago

Yep you got it. Similar to pinning the "latest" tag.

SuC complains that the version is too long and contains non-alpha numeric characters.

relevant logs:

time="2024-01-22T20:19:30Z" level=error msg="WithLatestTag(v1.6.2@sha256:41d90eda13809d5012cfbd23209575f51d5b2b5a9c609bc13f2f77d0910ebee0): invalid tag format (on ghcr.io/siderolabs/talosctl)" time="2024-01-22T20:19:30Z" level=error msg="WithLatestTag(v1.6.2@sha256:41d90eda13809d5012cfbd23209575f51d5b2b5a9c609bc13f2f77d0910ebee0): invalid tag format (on ghcr.io/siderolabs/talosctl)" time="2024-01-22T20:19:30Z" level=error msg="error syncing 'system-upgrade/talos': handler system-upgrade-controller: failed to create system-upgrade/apply-talos-on-node-2-with-664b4dedff7d4a721d8987606e8035-a65cd batch/v1, Kind=Job for system-upgrade-controller system-upgrade/talos: Job.batch \"apply-talos-on-node-2-with-664b4dedff7d4a721d8987606e8035-a65cd\" is invalid: [metadata.labels: Invalid value: \"v1.6.2@sha256:41d90eda13809d5012cfbd23209575f51d5b2b5a9c609bc13f2f77d0910ebee0\": must be no more than 63 characters, metadata.labels: Invalid value: \"v1.6.2@sha256:41d90eda13809d5012cfbd23209575f51d5b2b5a9c609bc13f2f77d0910ebee0\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?'), spec.template.labels: Invalid value: \"v1.6.2@sha256:41d90eda13809d5012cfbd23209575f51d5b2b5a9c609bc13f2f77d0910ebee0\": must be no more than 63 characters, spec.template.labels: Invalid value: \"v1.6.2@sha256:41d90eda13809d5012cfbd23209575f51d5b2b5a9c609bc13f2f77d0910ebee0\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')], requeuing"

brandond commented 5 months ago

Ah. So the problem is that the version string doesn't fit in the label, as it's too long... and the digest contains characters that are not valid label values.

Have you considered just appending the digest to the tag? Mutable version tags are considered somewhat of an anti-pattern in many circles. Common patterns include appending a date or timestamp to the version, or including an abbreviated digest. v1.6.2-41d90ed or v1.6.2-20240122 for example.

v1.6.2 should always be v1.6.2, mutating that creates additional problems with image caching on the nodes unless you are religious about setting your image pull policy.

keatontaylor commented 5 months ago

yeah, considered various different options, the abbreviated digest in the tag makes sense to me, need to determine how to get that fully functional in renovate bot, but that shouldn't be too hard.

Thank you for the feedback and I understand the limitations now, going to close the issue.

dweomer commented 5 months ago

argh, the labeling!

@keatontaylor i still think this is a legit issue. it shouldnt be terribly hard to modify the version label that we apply (we already munge it for tags with semver metadata, iirc)

keatontaylor commented 5 months ago

I'm happy to keep this open if you prefer, supporting the digest in the version would absolutely bring it in feature parity for the various ways a docker version can be specified.

I do agree that some level of caution should be used here in terms of tag management but that is really up to the consumer if this feature gets implemented.

dweomer commented 5 months ago

another workaround is to add something to the values that get digested to determine a new version. envvars and secrets work. you can also add things to be digested by annotating your plan according to: https://github.com/rancher/system-upgrade-controller/blob/af373602d649f108321eb4c784d490972499ad9f/pkg/apis/upgrade.cattle.io/constants.go#L9-L13

so assuming an external process to modify the plan with the latest hash, youve got a couple of ways to have that digested and hence get picked up as "new"

brandond commented 5 months ago

That would be a good way to do that if they're directly generating the plan spec. If they're using a channel server to set the target version, I'm not sure they currently have any recourse.

dweomer commented 5 months ago

That would be a good way to do that if they're directly generating the plan spec. If they're using a channel server to set the target version, I'm not sure they currently have any recourse.

yeah, we'd have to change SUC to apply a munged value for the label which i forget if anything reads that as source-of-truth

edit: just refreshed my memory. some handler code would have to be updated (string equality 😬 )

keatontaylor commented 5 months ago

Well the other downside is that due to the way Talos generates and renews it's credentials in cluster, we have to set:

secrets:
  ignoreUpdates: true

This is to prevent the plan from re-triggering when Talos updates the secret to refresh the creds. I had considered the approach of just manipulating the secret.

dweomer commented 5 months ago

Well the other downside is that due to the way Talos generates and renews it's credentials in cluster, we have to set:

secrets:
  ignoreUpdates: true

This is to prevent the plan from re-triggering when Talos updates the secret to refresh the creds. I had considered the approach of just manipulating the secret.

you can put envvars on the container spec: https://github.com/rancher/system-upgrade-controller/blob/af373602d649f108321eb4c784d490972499ad9f/pkg/apis/upgrade.cattle.io/v1/types.go#L65-L73

or ... override the image ref in the containerspec which makes more sense. the trick would then be to annotate the plan to make sure that gets digested


apologies for the run-around/thread. i wrote the dang thing and i forget how it works sometimes (its, uh, tightly wound)