operator-framework / rukpak

RukPak runs in a Kubernetes cluster and defines APIs for installing cloud native content
Apache License 2.0
52 stars 50 forks source link

Best way to keep rukpak up-to-date #498

Open doriath opened 2 years ago

doriath commented 2 years ago

I would like to use rukpak and be able to keep it up to date. It would be great if we could use rukpak to keep rukpak up-to-date. For that, I would need some configuration mechanism that can be used to track the updates. Based on what rukpak currently offers, I see following options:

  1. using plain provisioner - simple container with yaml files, similar (or the same) yaml file recommended in current installation instructions
  2. using registry provisioner - would require creating the configs
  3. using helm provisioner - would require creating helm chart (already tracked in separate issue)

I really like option 3, as it would allow us to customize the setup (e.g. using custom namespace or tweaking resource requirements/limits). Also, the installation instruction would look like:

  1. install rukpak using helm chart
  2. add BundleDeployment for rukpak
  3. to update rukpak to new version, just change the version in BundleDeployment and rukpak will update itself
timflannagan commented 2 years ago

I'm also currently leaning towards option 3 here. We've discussed in the past using rukpak APIs to solve similar problems around having rukpak manage a resolved "unit" of Bundles (i.e. rukpak APIs managing rukpak APIs) to solve higher-level components use cases that build on top of rukpak, so this isn't an unfamiliar pattern that we've been exploring.

In this particular case, what's the workflow for updating rukpak to a new version? Are users manually updating a BD resource on the cluster with the new version, or are we releasing a new helm chart version and gitops is handling this for us, is there a goroutine/webhook periodically watching for new release event, etc.? I'm also a bit hesitant around introducing a helm chart, and then being forced to maintain it going forward if the maintenance burden is too high.

joelanford commented 2 years ago

I think this is essentially a bootstrapping/resource inheritance problem right? Steps are:

  1. deploy rukpak "manually" using the current install method
  2. create a BD that somehow inherits the existing rukpak deployment
  3. from then on, rukpak upgrades can be processed by updating that BD to point to a new rukpak bundle.

I could even see us combining 1 and 2, such that all rukpak installations are bootstrapped automatically.

I think the discussion about which bundle format to use is somewhat orthogonal though.

timflannagan commented 2 years ago

@joelanford That might be a decent project for the upcoming hack-and-hustle. I think we'll run into potential issues around the inheritance there given the requisite helm adoption labels won't be present?

akihikokuroda commented 2 years ago

Related to this. I wonder if we can put each provisioner in a separate bundle. The PC API may manage the life cycle of the provisioners.

joelanford commented 2 years ago

given the requisite helm adoption labels won't be present?

Aren't those deterministic though? I bet we could inject those labels, apply the static manifests, and then create a BD that references those manifests and the BD controller would say, "cool, looks good, nothing to do here".

Where the bundle format happens to come into play here is because of an implementation detail of the plain provisioner. If rukpak was bundled into a helm chart, we could very likely bootstrap by doing helm install rukpak --namespace rukpak-system --version 1.2.3 && kubectl apply -f rukpak-1.2.3-bd.yaml where the rukpak BD references the same helm chart.

timflannagan commented 2 years ago

Where the bundle format happens to come into play here is because of an implementation detail of the plain provisioner

Yep, that's true.

I wanted to test out whether you could install rukpak from our release quickstart instructions, then generate the main branch release manifests via $ make quickstart VERSION=main and consume that generate rukpak.yaml file using rukpakctl run bundle but I got side tracked with meetings.

timflannagan commented 2 years ago

Alright so I came back to this, and ran into those helm adoption errors that I had mentioned before:

```yaml status: conditions: - lastTransitionTime: "2022-08-18T17:24:19Z" message: Successfully unpacked the rukpak-v0.9.0-trunk-57694cc5f8 Bundle reason: UnpackSuccessful status: "True" type: HasValidBundle - lastTransitionTime: "2022-08-18T17:24:20Z" message: 'rendered manifests contain a resource that already exists. Unable to continue with install: Namespace "rukpak-system" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "rukpak-v0.9.0-trunk"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "rukpak-system"' reason: InstallFailed status: "False" type: Installed ```

And it's probably a bad idea to extend that run bundle sub-command with an opt-in toggle that updates those resources with the required annotation keys.

awgreene commented 2 years ago

We spoke about this ticket in the OLM Issue Triage Meeting and agree that RukPak must define an upgrade strategy. We don't want to direct the community to one solution just yet, so please keep suggesting different approaches.

timflannagan commented 2 years ago

@joelanford Added a bootstrap rukpakctl sub-command in https://github.com/operator-framework/rukpak/pull/508. That still needs to be merged, and it will be hidden under an alpha command, but could still be used here to hack around the adoption issue mentioned above. There's still some gaps here though, like whether we need to automate this experience, or delegate it to something like gitops.

timflannagan commented 2 years ago

We could also just create a rukpak operator/controller/etc. that auto-updates itself by defining a list of supported edges. This might be a heavier solution than the ones proposed so far.

github-actions[bot] commented 1 year ago

This issue has become stale because it has been open 60 days with no activity. The maintainers of this repo will remove this label during issue triage or it will be removed automatically after an update. Adding the lifecycle/frozen label will cause this issue to ignore lifecycle events.