Open kispaljr opened 1 month ago
cc: @henderiw @tliron @johnbelamaric @Catalin-Stratulat-Ericsson @efiacor @liamfallon
Ok lets firsts try to highlight the problem we want to solve as this is not clear from this issue.
I will try to rephrase it: We want to store extra information that is related to a particular kpt package, but that is not strictly its content. All such information/metadata is now stored in the Kptfile (ref to the upstream package, readinessGates, conditions, etc.). Now we encountered a situation, when we want to store a new kind of metadata needed for porch to do its job properly, but we cannot just add a new field to the Kptfile, since kpt is a separate project now and they own the schema of the Kptfile.
So, according to this proposal, porch will to create its own file (Porchfile) that is similar to Kptfile, but separate from it, and whose schema is fully owned by the porch project. This will maintain compatibility between kpt packages managed by porch and the kpt CLI command, and still allow us a place to introduce new kind of data that we want to store in a package.
Why does it have to be a file and not just metadata to the PR CRD?
That is what I tried to justify in the original wording of the PR, by mentioning the use case when e.g. a vendor's blueprint repo is registered in multiple customers' porch instances. But another good example would be if you unregister a repository from porch (by deleting the Repository object) and re-register it. All labels and annotation added to the PackageRevision CRs will be lost after re-registering the repo. And this is fine, but there is types of package metadata that you don't want to lose when you unregister the repo from porch, for example: the reference to the upstream package, or the readinessGates. This kind of metadata should be stored alongside the package in git/oci. And the easiest way to add this information to git/oci is to create a file. I would say that this is reasoning behind having a Kptfile in the first place.
There is some conflicting info here:
and some info specific to porch implementation.
Yes. I agree, but the backend storage behind PackageRevision objects is git, isn't it? Similarly how normal CRs are stored in etcd.
As far as I can see, all persistent data presented via PackageRevision objects has to be somehow stored in git. Either encoded in the name of a git branch (how we store workspace names for draft/proposed packages), or in git annotations (like how store workspace names for published packages), or in files in git (that's what we do with readinessGates in Kptfile), or maybe using some other git concept, but definitely in git.
Am I missing something?
[In my understanding the metadata store in porch was designed for ephemeral data, that is known to be lost on unregistering the repo]
For the record: I completely rewrote the original proposal text in the first comment, based on the discussion with @henderiw . I hope the new version is a more concise and digestible description of what we want to add and why.
I support this idea and indeed implemented it in the TKO experiment, where it's used to keep track of bookkeeping metadata (URL of the original template used) as well as allowing for some optional user-controlled knobs for changing default behavior, such as disabling auto-approval (example).
I don't think "Porchfile" is the best name, though, as hopefully it can be agnostic to any specific implementation.
If we accept the basic idea we can move on to discussing its structure and what should be included.
By all means, let's find a better name for this file :)
this is the proper way to solve this, rather than keep hacking. My 2 cents. https://github.com/nephio-project/nephio/issues/598
I see this as a different, independent problem. I assume even if PackageRevisions are CRs, we want to keep registering Repositories and auto-discovering the packages in them (i.e. there will be a PackageRevision controller that creates/updates/deletes PackageRevisions based on what is in git). If this is true, then we still have to decide where the "persistent metadata" attached to a package is stored between Repository unregistration and re-registration. In other words we have to decide where the PackageRevision controller will read the persistent metadata values from, during auto-discovery. Porchfile tries to solve the problem of persisting the kind of metadata whose lifecycle is bound to the lifecycle of a package in git (that I called "persistent metadata"), and not the kind of metadata whose lifecycle is bound to the lifecycle of a PackageRevision KRM object (that I called "ephemeral metadata").
Ok I reread your new text, but the way to solve this is a bit different in my view and I am of the opinion porch should just care about access to git, not about any specialization, etc. As such a workspace name is a local matter, so it does not need to be persisted and it does not need to be coordinated across clusters. Wrt condiitons, etc that work is triggered from a PV and the corresponding technologies should report their status, so the reproducibility should come from that chain not porch.
TL;DR
Add a YAML file named
Porchfile
to kpt packages that are managed by porch to persistently store porch-specific metadata.Overview
Porch handles all kinds of metadata that is related to a kpt package, but that is not strictly the content of the package. Historically porch used the Kptfile to persist this kind of metadata together with kpt's metadata. This was natural, since kpt and porch were parts of the same project. Recently porch was moved out from kpt, so we, the porch maintainers no longer have ownership over the Kptfile schema, but some of the new features we are trying to implement needs us to attach new kinds of metadata to the package. Hence this proposal to add our own, porch-specific metadata file, called Porchfile, to porch-managed kpt packages to store porch-specific metadata. This would keep compatibility with kpt, but allow us to add new kinds of porch-specific metadata, at the same time.
Porchfile should contain a valid KRM object, with the local-config annotation. Here is a possible example Porchfile content:
What kind of metadata should be stored in Porchfile?
Right now porch handles two kinds of package metadata:
"persistent metadata" is persisted in git/oci alongside the package. Right now these are mostly stored in the Kptfile (upstream package reference, mutation pipeline, readiness gates, conditions), but some are encoded in the name of a git branch (how workspace names for draft/proposed packages are stored), or in git annotations (how workspace names for published packages are stored). When a new Repository is registered in porch and the package revisions are automatically detected, "persistent metadata" parameters of the package already contain their latest values that was set by the last update operation. Furthermore if the same Repository is registered in another porch in another Kubernetes cluster, then the "persistent metadata" values of the same package will eventually be the same in the two separate porch instances, in the two separate PackageRevision objects. Also, changing a persistent metadata value in one porch instance will be visible in all other porch instances that have the same Repository registered in them. All-in-all the lifecycle of "persistent metadata" is bound to the lifecycle of the package itself, and their scope spans across all porch instances.
"ephemeral metadata" is stored in the etcd of porch's Kubernetes cluster. Some examples would be owner references, labels and annotations of a PackageRevision. When a new Repository is registered in porch and the package revisions are automatically detected "ephemeral metadata" parameters are empty, and if these values are changed, the changes are lost when the Repository is unregistered. Furthermore if the same Repository is registered in another porch in another Kubernetes cluster, then changing an ephemral metadata value in one porch instance won't be visible in others. So the lifecycle of "ephemeral metadata" is bound to the lifecycle of a single Repository registration, and its scope is limited to one porch instance.
The proposal is to store new kinds of "persitent metadata" in the Porchfile, and maybe migrate the exisisting porch-owned metadata from the Kptfile to the Porchfile to allow future schema changes.
Alternatives considered
Kptfile
A natural alternative to adding a new Porchfile would be to keep adding new fields to the Kptfile itself. However, we can only modify our own version of the Kptfile schema, that is stored in the porch codebase, and that would make porch-managed kpt packages incompatible with the
kpt
CLI tool. This is a problem for porch developers and users, since we still heavily utilize the kpt command for development and troubleshooting of kpt packages, e.g. to runkpt fn render
on a local copy of a kpt package, or deploying a kpt package bykpt live apply
.Metadata store
Besides git/oci, porch currently has another storage backend, the so-called metadata store. But that is designed to store "ephemeral metadata", and this proposal is about the "persistent type metadata" only.