nephio-project / nephio

Nephio is a Kubernetes-based automation platform for deploying and managing highly distributed, interconnected workloads such as 5G Network Functions, and the underlying infrastructure on which those workloads depend.
Apache License 2.0
93 stars 52 forks source link

Porch: PackageRevision `metadata.uid` field is not unique #748

Closed kispaljr closed 1 week ago

kispaljr commented 1 month ago

The .metadata.uid field of PackageRevisions are simply created from the name and revision of the package. It is not unique in the whole cluster, because it doesn't contain the name of the registry or the namespace.

Here is an example manifest fragment for a reference:

  apiVersion: porch.kpt.dev/v1alpha1
  kind: PackageRevision
  metadata:
    creationTimestamp: "2024-05-30T12:13:16Z"
    labels:
      kpt.dev/latest-revision: "true"
    name: dev-e93156a441afcbcc9086e70696691bfd948faf70
    namespace: default
    resourceVersion: 4dae54df58d90db0d3c45185654cf3b68cd0470a
    uid: uid:helloworld-simple:v1
  spec:
    lifecycle: Published
    packageName: helloworld-simple
    repository: dev
    revision: v1
[...]

This is problematic, because:

tliron commented 1 month ago

We should decide what "uniqueness" means here and what it is for. It is for clients to use, but for what purposes?

Two potential approaches:

1) Treat the objects as K8s objects. So, the identifier would be derived from K8s identifiers: namespaces, GVK, and metadata.name. We would also add revision information.

2) Treat the objects as "real world" objects, in this case git refs for a specific git repo URL.

I am leaning on 2. The reason being is that I think clients would be interested in the "real world" object, e.g. for storing in a database or similar. So, if there are two Porch instances running, and they use different (arbitrary) namespace names, the UID would be identical if they are referring to the same git commit.

If users actually do care about the K8s facade for some reason, then it's trivial for them to generate their own UID, because they are already accessing the K8s resource via namespace, GVK, and K8s name. Thus the UID would provide an additional data feature rather than just be a translation of data they already have.

kispaljr commented 1 month ago

Right now if you have two porch Repository objects pointing to the same git repo in two separate namespaces, then they should "generate" two separate sets of PackageRevision objects: mostly identical, but the namespaces will be different, so they are distinctly separate objects in Kubernetes-terms (different GVKNN), and it means to me that their UID should be different, as well. (As it would be if we used the kube-apiserver.)

On the other hand if the UID of a KRM object changes during Update it will cause problems. At least my first implementation accidentally did change the UID on update, and test cases were failing, because the object in question was suddenly and unexpectedly deleted (I guess by the K8s GC, but I am not sure). I can investigate further, if necessary.

I would suggest to mimic what the kube-apiserver does with UIDs when it is possible, because that will cause the least amount of surprises to our clients.

kispaljr commented 1 month ago

One clarification: if we include the revision information in the UID, it means that the UID will change on every Update, because the revision number changes. Hence my comment about that above.

kispaljr commented 1 month ago

We can also base the UID of the PackageRevision on the UID of the corresponding PackageRev somehow (i.e. hashing), because the later is a CR managed by the kube-apiserver. But I would like to get rid of PackageRevs as soon as possible, so maybe not... :)

kispaljr commented 3 weeks ago

We are relying of the uniqueness of UIDs in the porch codebase at these points: