Closed SteveLasker closed 3 years ago
This first draft is looking good. I have two pieces of high level feedback advocating to keep as much simplicity in the manifest as possible.
Thanks, @dmcgowan, KISS (Keep It Silly Simple) is definitely a key goal. The manifest should provide just enough information to do what needs to be done, enabling registries to work generically over different artifacts, while providing client tools the info they need to work with their specific artifacts.
These could be categorized into two types of references, weak references and strong references.
We've played with the collections a few ways, including a single collection that contained a direction and strictness (weak/strong, hard/soft, lose/tight?).
Again, please don't read too much into the names as we wanted to figure if the structure worked, and we'd figure out the proper names later, but here was an example of the single collection:
{
"mediaType": "application/vnd.oci.artifact.manifest.v1+json",
"artifactType": "application/vnd.cncf.notary.v2",
"config": {
"mediaType": "application/vnd.cncf.notary.config.v2",
"digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7",
"size": 102
},
"references": [
{
"mediaType": "application/vnd.cncf.notary.v2.json",
"digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0",
"size": 32654,
"direction": "child",
"strictness": "weak|strong"
}
]
}
The problem with a single collection:
By using different collections to represent:
dependencies
, dependent-upon
, required/s
: which are reverse pointers. This avoids a property required for tracking when/if it should be deletedreferences
, weak-references
, loose-references
: which are the things that are good to know, enabling validation, copying and visualization, but don't get tracked for deletion. Although, a client CLI could parse these and ask the registry to delete these if that's the experience wanted.metadata
There's lots of interesting metadata scenarios. Some are known at the time the artifact is submitted. Others are later. The problem is how do we address metadata added after, as we need a way to add without changing the digest. I'm punting this to the OCI Metadata Service round of discussions.
artifactType
/mediaType
Not all artifacts will have config, yet different artifacts may share the same config schema. In ORAS, we had to explicitly enable the scenario of defining a manifest.config.mediaType
, without having a config blob. But, this was mostly a concession to avoid having to rev the image-index.manifest to identify what type of artifact the schema represented.
Since we're defining a new manifest, it seemed time to lift the artifactType
property to the root. This enables the manifest.config.mediaType
to be decoupled from the manifest.artifactType
, allowing them to rev, or even be defined independently. I could see a few different artifact types sharing the same config schema, such as different ways to represent images with different compression formats, or even the new IBM z/OS types.
If/when we get a clean new artifact.manifest
, I could see all non-container image artifacts moving from using the current oci-image manifest to this artifact.manifest as they'd have more freedom to define references, and cleanup other aspects. Who know, maybe OCI Image manifest v2 might switch as well...
there is no need to treat layers (or "blobs") as special case references. [from dependencies]
You're correct, these are both "hard" references from the artifact manifest perspective. The two main differences noted above is directionality for ref-counting, and whether the registry looks in the blob store for content or the manifest store.
Will references in an artifact always be local to the current repository? I think Helm breaks that logic with image references, but Helm also breaks a lot of the artifact logic since the image names themselves could be templated to point to a different location by a values.yaml after pulling the chart, so it may be worth excluding Helm image references from any of the chained reference logic. I suspect having everything in the same repo is important for GC, but also useful for portability of manifests and their artifacts.
What are the methods to lookup an artifact? Is it only with a query using a manifest sha, or can tags point to artifacts? If so, do we need to namespace a tag lookup to the type of artifact we are looking for, so that artifacts with tags don't accidentally collide with images or other artifacts in the same repo? That question comes from looking at how TUF could possibly be implemented with artifacts, and they may want e.g. a "snapshot" TUF artifact in a repo, that applies to multiple manifests, and that they can lookup and update at any time.
The Artifact Manifest idea looks great! It helps to attach artifacts to images without affecting existing container images and thus keeps backward compatibility.
What's more, it looks super easy to add new artifact types based on the artifact manifest. For example, a nydus artifact manifest would look like:
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.artifact.manifest.v1+json",
"artifactType": "application/vnd.cncf.nydus.v1",
"config": {
"mediaType": "application/vnd.oci.image.manifest.v1.config.json",
"digest": "sha256:9e988712154fcc2ceda5602eb1d98c1f28299ba6fbf0be49d3717c35a2d76674",
"size": 1102
},
"blobs": [
{
"mediaType": "application/vnd.cncf.nydus.bootstrap.v1.tar+gzip",
"digest":
"sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
"size": 32654
},
{
"mediaType": "application/vnd.cncf.nydus.blob.v1",
"digest":
"sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
"size": 72832
},
{
"mediaType": "application/vnd.cncf.nydus.blob.v1",
"digest":
"sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
"size": 928324
}
],
"references": [
{
"artifact": "mysql:8",
"artifactType": "application/vnd.oci.image.manifest.v1.config.json",
"mediaType": "application/vnd.oci.image.manifest.v1.config.json",
"digest":
"sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b",
"size": 16724
}
]
}
For those who are not familiar with nydus
, it is an image acceleration service that hugely reduced the time of pulling container image by on demand reading image contents when container starts. It is currently widely used in both Alibaba and Ant Group. nydus
is open source and maintained as part of the CNCF incubator project Dragonfly. That's why I'm suggesting the same application/vnd.cncf
prefix like the other artifact types.
The nydus artifact manifest follows the same schema for other artifact types, while a new artifact type and two media types are added:
And It has a references
relationship with the original mysql:8
container image. These information would help registry to index and show the relationship between different image, as well as help container runtime to choose if it wants to launch containers with image acceleration.
Currently we have an nydus image annotation
/os.feature
hack to hide nydus image details from registry. However, with the OCI artifact manifest, we can abandon such hack and have the registry support natively and smoothly.
@SteveLasker would it make sense to list nydus
as one of the supported artifact types to show how the artifact manifest spec can help other artifact types?
would it make sense to list nydus as one of the supported artifact types to show how the artifact manifest spec can help other artifact types?
Yes, it would be great to add the example, as it's a new example I hadn't yet thought of. But, if it fits, even better. Let me digest a bit more and align with the new-new manifests
collection with an annotation
for references
.
@SteveLasker Thanks a lot! I tried to amend the above nydus artifact manifest example to fit the new-new manifests
collection with an annotation
for references
. Please see if I understand the new schema correctly:
{
"mediaType": "application/vnd.oci.artifact.manifest.v1+json",
"artifactType": "application/vnd.cncf.nydus.v1",
"config": {
"mediaType": "application/vnd.oci.image.manifest.v1.config.json",
"digest": "sha256:9e988712154fcc2ceda5602eb1d98c1f28299ba6fbf0be49d3717c35a2d76674",
"size": 1102
},
"blobs": [
{
"mediaType": "application/vnd.cncf.nydus.bootstrap.v1.tar+gzip",
"digest": "sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
"size": 32654
},
{
"mediaType": "application/vnd.cncf.nydus.blob.v1",
"digest": "sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
"size": 72832
},
{
"mediaType": "application/vnd.cncf.nydus.blob.v1",
"digest": "sha256:f6bb0822fe567c98959bb87aa316a565eb1ae059c46fa8bba65b573b4489b44d",
"size": 928324
}
],
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:8c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c31",
"size": 1578,
"annotations": {
"oci.distribution.relationship": "references",
"oci.distribution.artifact": "mysql:8",
"oci.distribution.artifactType": "application/vnd.oci.image.v1"
}
}
]
}
To explain it in more details:
application/vnd.cncf.nydus.v1
artifact type;application/vnd.cncf.nydus.bootstrap.v1.tar+gzip
and application/vnd.cncf.nydus.blob.v1
) to describe them;reference
relationship with the original mysql:8
image, which can be persisted either within the same registry or in a different registryWith such an artifact manifest,
@SteveLasker For some of the artifacts such as CNAB or Helm charts(v3), there are tools to package them as OCI artifacts and store them in OCI registries such as helm v3, so does this PR mean to introduce a break-change to the existing artifacts?
@SteveLasker For some of the artifacts such as CNAB or Helm charts(v3), there are tools to package them as OCI artifacts and store them in OCI registries such as helm v3, so does this PR mean to introduce a break-change to the existing artifacts?
You could think of it as a new version. Let's say a new version of CNAB and/or Helm could use this new manifest, but that's a choice for these communities to make. As with anything that's already shipped, with limitations, it's always a choice for whether a change provides enough value. It's my hope the references solves many of the limitations of Artifacts v1, based on the image-manifest that makes it worth the change, and tooled in such a way it adds enough value with minimal breaking change implications.
@SteveLasker Overall looks good. My one question/slight concern is around references: With above (unless I'm mistaken) it would be possible to form a chain of dependent manifests as large or as long as clients specify. Do we have any concerns about this proving hard to mirror? The one major benefit of the current manifest list design is that it is (relatively) flat: a tag points to a single list and the list points to a set of manifests, but it cannot go beyond that.
What is the expected direction for tooling that will need to mirror the current manifests? Walk the manifests to determine the full set to replicate?
The one major benefit of the current manifest list design is that it is (relatively) flat: a tag points to a single list and the list points to a set of manifests, but it cannot go beyond that.
This is not the case:
This descriptor property has additional restrictions for manifests. Implementations MUST support at least the following media types:
application/vnd.oci.image.manifest.v1+json
Also, implementations SHOULD support the following media types:
application/vnd.oci.image.index.v1+json (nested index)
Image indexes concerned with portability SHOULD use one of the above media types. Future versions of the spec MAY use a different mediatype (i.e. a new versioned format). An encountered mediaType that is unknown to the implementation MUST be ignored.
And the diagram from https://github.com/opencontainers/image-spec/blob/master/media-types.md#relations
@josephschorr
What is the expected direction for tooling that will need to mirror the current manifests? Walk the manifests to determine the full set to replicate?
My own intention is to support recursive copies with a user configurable max depth on the recursion. And if the max depth is hit, send back a warning or error.
I'd also want to handle the types of artifacts differently (allowing users to filter in/out). For example, they may want to mirror images and notary signatures that point to those images, but may not be interested in mirroring helm charts that point to the image (along with everything else that helm chart points to).
And I'd want some directionality on the references for mirroring, e.g. helm charts may mirror child images, but child images don't mirror parent helm charts.
That assumes it makes sense to link helm charts and images, which I'm still uncertain of (charts can template an image name, and artifacts don't template a reference).
This is not the case:
Right... I forgot that it was intended that manifest lists could reference other lists. We just never did so because they were (practically speaking) never used.
My own intention is to support recursive copies with a user configurable max depth on the recursion. And if the max depth is hit, send back a warning or error.
Perhaps we should formalize this a bit then? I could see a scenario where someone pushes a large-chain of manifests to a repository and then, when the repository is mirrored, it fails at that point. Unsure if it would be better to fail at push time, though.
My own intention is to support recursive copies with a user configurable max depth on the recursion. And if the max depth is hit, send back a warning or error.
Perhaps we should formalize this a bit then? I could see a scenario where someone pushes a large-chain of manifests to a repository and then, when the repository is mirrored, it fails at that point. Unsure if it would be better to fail at push time, though.
The tooling I'm working with is strictly client side, so a server decision to fail would be separate. I could see this being enforced by the registry similar to how user namespaces as the first part of the repository path is enforced by many registries, which is separate from the spec. I'd be interested in the http response codes when the server refuses to accept an artifact for reasons like this.
I've taken a ton of great feedback that we need more bake-time on the references
collection and scenarios, including the registry/repo mapping conversation
I'll be closing this PR, once I revert a few things. I ask folks to please focus on #29 for feedback, as it has the core link-list of manifests
required for Notary v2, SBoM and other linked artifact scenarios like IBM and Google signing solutions.
What is the expected direction for tooling that will need to mirror the current manifests? Walk the manifests to determine the full set to replicate?
I think we have to first define "mirror". Is a mirror at a registry/repo level? Meaning, whatever is in a given repo is mirrored?
Or, are we talking about gated mirrors, where the user opts-into specific content. Likely at the :tag
level?
If at the repo, I suspect the client would pull all content in that repo and keep it current. New events or polling a list API, hopefully with a changedSince
type parameter would work.
If at the tag, then it could walk the references
and the artifacts that have the target tag referenced in the manifests
collection.
Since manifests
references must be in the same repo, it's less of a concern, as the repo, or the dependencies can be walked.
Perhaps we should formalize this a bit then?
I worry about the formalization of a dependency count. In some cases, it makes sense, like the 256 registry/namespace character limit. But npm and other package managers have kinda dealt with this. I see this as a client configuration scenario as it just seems hard to know upfront how the dependencies are either circular and closed or endless. I suspect a registry throttling scenario would solve this, but I'd have to think more. With references
on-hold, I'm saving my brain cycles for how to best process this till later.
The tooling I'm working with is strictly client side, so a server decision to fail would be separate. I could see this being enforced by the registry similar to how user namespaces as the first part of the repository path is enforced by many registries, which is separate from the spec. I'd be interested in the http response codes when the server refuses to accept an artifact for reasons like this.
This ^ sounds like something to consider as we revisit this scenario. How do all package managers like Pypi, npm, ... manage these types of scenarios?
I'm closing this one as it's no longer the active conversation. See #29 for the more focused, iterative approach. Happy to continue the background conversation here if folks want to keep thinking about it.
I've reverted the changes to the latest thinking on a references
collection for weak references, supporting a dependency graph.
The OCI artifact manifest provides a means to define a wide range of artifacts, including a chain of dependencies of related artifacts. It provides a means to define multiple collections of types, including blobs, dependent artifacts and referenced artifacts, expanding on the work done around OCI Artifacts based on oci.image.manifest, addressing the challenges attempted with image index
This is an initial PR for discussion.