opencontainers / artifacts

OCI Artifacts
https://opencontainers.org
Apache License 2.0
224 stars 54 forks source link

definitions: add what is an artifact #50

Closed Silvanoc closed 2 years ago

Silvanoc commented 2 years ago

The whole artifacts specification relies on the concept of an "artifact" without defining it. This patch tries to fill-up the gap.

Signed-off-by: Silvano Cirujano Cuesta silvano.cirujano-cuesta@siemens.com

Fixes #32

SteveLasker commented 2 years ago

Thanks @Silvanoc, This is a great start.

Silvanoc commented 2 years ago

@SteveLasker I'm neither happy with the definition, nor convinced that I got it right...

In this ORAS artifacts example (let's assume that it's an OCI artifacts example), what would you call an artifact?

Is net-monitor:v1 an artifact? If not, what's its name? Is it a repository?

Is net-monitor:v1 providing/hosting/referring four individual artifacts?

  1. OCI container image
  2. notary v2 signature
  3. sample sbom
  4. nydus image

I've written the definition of my proposal assuming that net-monitor:v1 would be called "an artifact", and that it's the set of the four "items" above together that make up the artifact. Meaning that in that implementation of the artifacts specification the user would expect the application to ensure the cohesion of all 4 items (one artifact instance is a fixed combination of them).

On other words, in my understanding of that is an artifact (that has inspired my definition) it's the unequivocal combination of all 4 items that make up an artifact instance.

Am I somehow getting it wrong?

SteveLasker commented 2 years ago

Thanks @Silvanoc, good questions to keep clarifying till we land on something solid:

what would you call an artifact?

  • Is net-monitor:v1 an artifact? If not, what's its name? Is it a repository?

I'd suggest net-monitor:v1 is an artifact, that lives in the net-monitor repo. It's an artifact as it's something the user interacts with and has an expectation around lifecycle management. The user can push, discover, pull, promote, delete it.

  • Is net-monitor:v1 providing/hosting/referring four individual artifacts?

The notary v2 signature, sbom are artifacts that also support push, discover, pull, promote, delete. They typically don't have individual lifecycle management, as they're considered extensions (need a better term) ​to the net-monitor:v1 image. For instance, we've seen most customers want these references deleted when the net-monitor:v1 image is deleted, as it's not super interesting to have a signature for something that doesn't exist.

The nydus image layout is another great example, as it expresses a way to store the net-monitor:v1 image disk layout in an expanded form. Would there be any value in storing the expanded layers of the net-monitor:v1 image if the image artiact was deleted? The reason the nydus layers are stored as a reference type is a user may wish to delete the nydus optimizations when they archive the net-monitor:v1 image.

The subtle question you are asking is whether the net-montior:v1 image is defined, in-total, with the references. This is the new concept of reference types, and reverse indexes.

The main innovation here is these are separable artifacts, that allow individual lifecycle and interaction. You can pull the signature and validate it, before you pull the image. Same with the SBOM, and its signature, and you may delete the nydus image optimization when it's no longer needed, but still have the original targ.gz version for optimized archival.

I would try it this way: The net-monitor:v1 is a container image artifact. It has additional referenced artifacts that you can discover through the same-named reference. You can interact with the graph of artifacts, such as promoting the entire graph, or promoting a filtered set of the graph. For instance, you may only want the signatures to be promoted, and leave the SBOMs and scan results behind when promoting to a production environment.

So, I think we're saying an artifact is something users interact with. They can push, discover, pull, promote, delete. On the other side, blobs are implementation details of the artifacts the user interacts with. For example, many registries de-dupe blobs. This isn't something a user should be interacting with, it's an internal detail the registry can manage on behalf of the user.

Is that helping?

Silvanoc commented 2 years ago

First things last :wink:

Is that helping?

Yes, a lot.

I think that your proposal the definition based on the users perspective overlaps with my definition. Therefore I find it good :slightly_smiling_face:

I would have a single last question before writing a new definition version: is it thinkable/foreseen/possible to have "abstract" artifacts? These are artifacts that are only the target of other artifacts "subject" references, but don't provide any data themselves.

sudo-bmitch commented 2 years ago

To me, an artifact is "an output of a build pipeline that is used by those downstream to build dependent applications and deploying applications". E.g. a container image, binary/library, SBoM, signature, and helm chart. How we store the artifact (flat file, directory, tar, pushed to a tool like artifactory, or pushed to an OCI registry) seems like an implementation detail that's not important for understanding what an artifact is. It's a bit like defining a web page by describing TLS, the transport isn't important for this definition. And calling it a "piece of data" is too abstract for me, since logs, metrics, credentials, and a lot of other things would also be considered data. Linking to the wiki may make more sense than maintaining our own definition: https://en.wikipedia.org/wiki/Artifact_(software_development)

Silvanoc commented 2 years ago

Wow, @sudo-bmitch just shot in a completely different direction from what the distro spec is providing as a definition. See this comment on the originating issue: https://github.com/opencontainers/artifacts/issues/32#issuecomment-959341232

IMO the distro spec definition is pretty good. It's perfectly fine to define something by its composing parts, but in that definition probably the use is missing.

See this definition from the Merriam Webster: a cutting instrument (what it is for) consisting of a sharp blade fastened to a handle (what it is composed of)

Silvanoc commented 2 years ago

Going for a reference of the definition provided in the OCI-distribution specification or derived from it. Therefore closing this.

SteveLasker commented 2 years ago

First, thanks for the deep engagement @Silvanoc. Particularly as you started this saying English isn't your first language, yet that's often a great way to test if the text meets a broader audience.

Distribution reference Thanks @mikebrow.

I forgot these were recently rewritten to help with clarity. I'd say the definition is largely correct, with a few minor points that have added to the complexity. For instance:

The above points are largely at the center of the OCI Artifacts and ORAS Artifacts work to generalize content in a registry, without being limited to what the runtime container image spec states. These conversations will likely be part of the Reference Type working group

I would have a single last question before writing a new definition version: is it thinkable/foreseen/possible to have "abstract" artifacts? These are artifacts that are only the target of other artifacts "subject" references, but don't provide any data themselves.

This is something a number of us have been trying to clarify.

Question: Is a manifest that has a subject somehow a different type of artifact?

The subject property states:

An OPTIONAL reference to any existing manifest within the repository. When specified, the artifact is said to be dependent upon the referenced subject.

The Lifecycle Management section of the oras artifacts-spec states:

In such registries, when the subject is deleted or marked for garbage collection, the defined artifact is subject to deletion as well, unless the artifact is tagged.

What this means is an Artifact may have a subject reference, but it can still be pushed, discovered, pulled independently. It can also have a tag, which gives it independent lifecycle management.

I wish I had a better way to define these. In lieu of that, I'd simply say the distribution spec reference is close:

Artifact: one conceptual piece of content stored as Blobs with an accompanying Manifest containing a Config

sudo-bmitch commented 2 years ago

Wow, @sudo-bmitch just shot in a completely different direction from what the distro spec is providing as a definition. See this comment on the originating issue: #32 (comment)

Apologies, there is the question of whether we're talking about a general software Artifact, an OCI Artifact, or an ORAS Artifact. I was defining the former thinking that was where the question was originating. and Mike did a good job defining the second (OCI Artifact).