opencontainers / distribution-spec

OCI Distribution Specification
https://opencontainers.org
Apache License 2.0
828 stars 205 forks source link

Reference Types Working Group remaining tasks #337

Closed sudo-bmitch closed 1 year ago

sudo-bmitch commented 2 years ago

From our meeting today, we are merging #335 with the following items left for a future PR:

mikebrow commented 2 years ago
sudo-bmitch commented 2 years ago

@mikebrow I don't think we want to change how existing artifacts, like a Helm chart, are handled. Does that negate some of the thoughts you had with implied refers digests and leaf nodes in a graph?

Jamstah commented 1 year ago

discuss an implied refer digest for/to the image.index when the image.index includes an artifact manifest without a refer reference

Is this proposing that the referrers API should include referrers from image indexes? I was looking to see if I could find any discussions around doing that...

sudo-bmitch commented 1 year ago

@Jamstah see https://github.com/opencontainers/image-spec/issues/971

Jamstah commented 1 year ago

That's more allowing an index to have a subject field, which I agree doesn't add up because of the graph looping.

I was more thinking:

Would it make sense for that to return:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.index.v1+json",
      "size": 1234,
      "digest": "sha256:a1a1a1...",
      "artifactType": "application/vnd.oci.image.index.v1+json"
    }
  ]
}

Would need to cover this in the spec I think, and add index info to this section:

Each descriptor is of an image or artifact manifest in the same namespace with a subject field that specifies the value of . The descriptors MUST include an artifactType field that is set to the value of artifactType for an artifact manifest if present, or the configuration descriptor's mediaType for an image manifest.

I'm guessing this has been discussed too, but I couldn't see it :/

sudo-bmitch commented 1 year ago

The referrers only returns manifests with a subject field that has a matching digest field. Treating the index entries as implicit subjects has the same issue as giving the index an explicit subject.

Jamstah commented 1 year ago

OK, I think I get it. Its all about the direction of the parent-child relationship. An artifact is the child of its subject. An index is the parent of its target. If we (implicitly) add a target as a subject of an index, they become both parents and children of each other. This hits problems where clients are both following indirection (to handle tags/indexes) and referrers (to copy image metadata).

If the subject->artifact link is always parent->child, there is no looping and the client can:

I had it in my head that a subject link can be used for indirection, but I've been thinking about it, and can't really come up with a good use case for that. A tag is for individual indirection and an index is for dispatch indirection, a subject should never be used for indirection. (do you agree?)

The only reason I was thinking to add index entries as referrers was to make GC easier, so the registry can make a GC decision on a digest without needing to evaluate the whole graph. I can imagine there might be use cases for clients to want to ask the registry "Is this digest a child of anything?", but I can't think of them right now, so the registry will either need to process the graph as a whole for GC, or maintain its own metadata about index references that is not exposed and use that.

Should we consider putting any GC guidelines into the distribution spec? For example:

sudo-bmitch commented 1 year ago

I had it in my head that a subject link can be used for indirection, but I've been thinking about it, and can't really come up with a good use case for that. A tag is for individual indirection and an index is for dispatch indirection, a subject should never be used for indirection. (do you agree?)

I'm not quite sure I follow, but probably.

The only reason I was thinking to add index entries as referrers was to make GC easier, so the registry can make a GC decision on a digest without needing to evaluate the whole graph. I can imagine there might be use cases for clients to want to ask the registry "Is this digest a child of anything?", but I can't think of them right now, so the registry will either need to process the graph as a whole for GC, or maintain its own metadata about index references that is not exposed and use that.

We've avoided adding GC to the spec explicitly, but the general advice is to treat the referrers to a manifest the same as you would treat child manifests of an index. As long as the index is tagged, many registries would keep that index and all child manifests. So if a registry is keeping an image manifest, it would also keep all artifacts with a subject field pointing to that image manifest. Inversely, when a manifest is deleted, any untagged manifest with a subject field pointing to the deleted manifest is often safe to remove.

Overly specific guidance is difficult because GC has been implemented differently for different reasons. Some registries maintain untagged manifests for various reasons (maybe time since it was untagged or last pulled, or n number of previous values of a tag). Then there are registries like ttl.sh that delete any tagged image after a timeout.

A reverse reference API might be useful for other reasons, but probably not for GC.

For GC, what I've seen described most is a mark and sweep method, where a registry marks all manifests to preserve, and then recursively marks all child objects (manifests and blobs). With the fallback tag, that model still works since there's a tagged index pointing to the artifacts. When the referrers API is added, registries should treat those referrers as child manifests when recursively marking objects to preserve.

jdolitsky commented 1 year ago

I think all remaining tasks have been addressed and the working group is no longer in session