operator-framework / operator-registry

Operator Registry runs in a Kubernetes or OpenShift cluster to provide operator catalog data to Operator Lifecycle Manager.
Apache License 2.0
211 stars 247 forks source link

Clarify which File Based Catalog data is source of truth #882

Open cdjohnson opened 2 years ago

cdjohnson commented 2 years ago

File Based Catalogs (FBCs) allow overriding all sorts of values that are normally supplied by the bundle.

Can we clearly define what is safe to override?

I can infer from the documentation that I can safely override this information:

Specified in the olm.channel CSV:

Specified in the olm.package Bundle annotations.yaml:

Specified in the olm.channel Bundle annotations.yaml:

Through some experimentation, I can also see that OpenShift 4.9 (OLM, PackageServer, Console) appears to honor additional overrides as well:

Specified in the olm.package CSV:

Specified in the olm.bundle Bundle dependencies.yaml:

My ask: What Bundle metadata (CSV, Metadata) can I safely override other than the graph update edges? Are the overrides I'm making an implementation side effect or is this part of the design? Can we document the acceptable bundle overrides?

joelanford commented 2 years ago

@cdjohnson Thanks for filing an issue about this! (I think I remember a brief conversation about how it would be helpful if we documented all of this)

CSV:

  • spec.replaces
  • spec.skips
  • metadata.annotations["olm.skipRange"]

Specified in the olm.package Bundle annotations.yaml:

  • operators.operatorframework.io.bundle.channel.default.v1

Specified in the olm.channel Bundle annotations.yaml:

  • operators.operatorframework.io.bundle.channels.v1

As far as I know, these fields are only ever read and used by opm during opm index/registry commands, so they are effectively deprecated. By the time you have a catalog, whether it is sqlite or FBC, these values in the CSV and bundle metadata have already been consumed by opm to build the catalog and are unused/ignored by OLM otherwise.

What Bundle metadata (CSV, Metadata) can I safely override other than the graph update edges?

Technically, all bundle upgrade edge data, resolution data (e.g. properties/constraints) and UI metadata can be overridden in the index. The packageserver and OLM resolver derive all of their views and decisions from information provided by the catalog. If a bundle is provided by an image (as opposed to catalog-inlined manifests), then the bundle image will be the source of truth for the manifests that are actually applied on the cluster during bundle install.

Are the overrides I'm making an implementation side effect or is this part of the design?

During this migration it's a little of both. I think of them as a side-effect of the implementation as a result of the reality of:

If/when we move beyond CSVs, it is likely that much of this metadata disappears from bundles entirely, obviating whether the catalog or the bundle is the source of truth (because eventually the data is present in just one of those artifacts).

The one piece of this that's debatable to me is whether olm.bundle contents should be changed after being opm render-ed from a bundle image.

My view: olm.bundle should almost never deviate from the bundle image. Any deviations will be difficult to debug. I would consider the bundle image the source of truth for the bundle's metadata files, and the catalog as merely a vehicle for propagating that metadata to a client without the client needing to pull the image.

Definitely interested in what other maintainers think about @cdjohnson's questions.

Can we document the acceptable bundle overrides?

:+1:

cdjohnson commented 2 years ago

In general: Because this data now exists in many places and has different consumers, it's very hard to know what the source of truth is by each consumer. Because it's evolving, I think it's super important to be prescriptive to avoid consequences of running on different versions of those consumers (e.g. PackageServer is using a different source of truth than the catalog operator a the moment). Even though opm registry add may be the only consumer of the bundle data, there is no guarantee that other clients aren't using them.

Regarding the Bundle Dependencies. They are currently in four places now:

  1. CSV yaml (GVK only)
  2. dependencies.yaml in the bundle (gvk and package)
  3. properties.yaml in the bundle (gvk, package, label, compound (coming soon)...)
  4. FBC olm.bundle properties:
    • olm.gvk
    • olm.package
    • olm.bundle.object (CSV)

It seems like a "safe" hard-liner stance right now is to only allow mutation of the olm.package and olm.channel objects and the olm.bundle object MUST MATCH the bundle image (it's basically a cache of its data). If we say that, then I think it would be useful for opm validate to verify this fact. Maybe some sort of deep validation option to verify consistency.

If we want to allow deviations, we should be very specific which deviations we want to allow. Dependencies, for example, could be useful for me (just for testing at the moment), but even for building specialized catalogs with different dependency graphs for those operators that support that.

joelanford commented 2 years ago

It seems like a "safe" hard-liner stance right now is to only allow mutation of the olm.package and olm.channel objects and the olm.bundle object MUST MATCH the bundle image (it's basically a cache of its data). If we say that, then I think it would be useful for opm validate to verify this fact. Maybe some sort of deep validation option to verify consistency.

This gets pretty messy quickly:

  1. If the bundle image is a manifest list, do i need to pull each underlying arch-specific image and verify all of them?
  2. opm validate so far does not require a network. Even if we put this consistency check behind a flag, it would be a pretty bad UX. The operatorhub.io catalog has ~1900 bundles in it. I haven't actually timed it, but I imagine it would take quite awhile to pull and render all of those bundles to compare to the catalog's olm.bundle contents.

I would generally lean toward documentation that:

  1. Highly encourages the concepts of bundle immutability and catalog reproducibility.
  2. Describes which pieces of metadata are used where, (i.e. what's read/acted upon by opm when rendering a bundle?, what's read/acted upon by OLM from the catalog source when processing a subscription?, what's read/acted upon by OLM from the bundle image when installing the operator?)
  3. Does not preclude overrides of the bundle contents in the catalog.

For 1, I could imagine package authors setting up their catalog repo using a "simple" olm.bundle:

package.yaml

---
schema: olm.package
name: example-operator
---
schema: olm.channel
package: example-operator
name: stable
entries:
- name: example-operator.v0.1.0
---
schema: olm.bundle
image: quay.io/joelanford/example-operator-bundle:0.1.0

Then running a tool that takes that file as input, opm renders the bundle to expand them, and then outputting that to a separate rendered FBC:

rendered/package.yaml

---
schema: olm.package
name: example-operator
---
schema: olm.channel
package: example-operator
name: stable
entries:
- name: example-operator.v0.1.0
---
schema: olm.bundle
image: quay.io/joelanford/example-operator-bundle:0.1.0
package: example-operator
name: example-operator.v0.1.0
properties:
- type: olm.gvk
  value:
    group: example.com
    kind: App
    version: v1
- type: olm.package
  value:
    packageName: example-operator
    version: 0.1.0
relatedImages:
- image: quay.io/joelanford/example-operator-bundle:0.1.0
- image: quay.io/joelanford/example-operator@sha256:a5f8fce740945fcdabf7f8ac6fcd6f0f95b3a828fbf17e43de251de9d5544118

And I'd have a sanity check in my CI that runs this process on PRs to ensure the input and output stay in sync.

joelanford commented 2 years ago

One thing that's probably worth doing here is looking at the way other package indexes work (e.g. rpm repos)

For example, I can run createrepo on a directory of rpms to get a repodata directory with a bunch of metadata about the rpms in the repo. When someone installs an rpm from my repo using yum, yum reads the contents of the repodata directory (especially primary.xml.gz) to understand package names, versions, and dependencies.

In that case, what's the source of truth? In one sense, its the rpms because that's what createrepo used to generate the index. In another sense, its whatever is actually in the repodata directory, because that's all that yum is actually looking at when making decisions.

The analogies in my head are: