open-component-model / ocm

Open Component Model (Software Bill of Delivery) Toolset
https://ocm.software
Apache License 2.0
35 stars 23 forks source link

'hint' as part of the artifact specification #935

Open fabianburth opened 1 month ago

fabianburth commented 1 month ago

Add hint to the artifact specification

Uploaders

The ocm library has a concept of uploaders (also called blobhandlers) within the ocm library. These uploaders essentially provide the functionality to upload a blob described as an artifact (thus, as a source or resource) as part of a component to a technology specific storage.

The uploaders are an integral part of the ocm transport process. During a ocm transfer, uploaders can be configured to be called to upload artifacts to a technology specific repository.

Mechanism

The mechanism behind the uploaders is explained by answering the following questions.

How does the ocm decide which uploader(s) are called for each particular artifact?
There is a registry of uploaders where the technology specific uploaders can be registered to be called for the set of (or a subset of) the following properties: artifact type, mime type, and implementation repository type.
The implementation repository type describes the type of repository technology based on which the ocm repository is implemented (also referred to as storage backend mapping in the ocm spec). The most common type of repository technology are OCI registries.

The implementation repository type allows for the implementation of default uploaders for ocm repository types. For example, if the implementation repository type is oci, an oci uploader attempts to upload all artifacts of artifact type ociArtifact as individual oci artifacts (without this uploader, the artifacts would only be available as a blob which is described by a layer of the oci artifact representing the respective ocm component).

NOTE: In the command line use case, the uploaders can be configured based on the ocm configuration (see here.

How does each uploader know where to upload the artifact to? To upload the blob, the uploaders get the blob itself, the artifact type, mime type, implementation repository type (so, the information it may be registered for), and a hint. The former information is supposed to be used by the uploader to ensure that the artifact is suitable to be uploaded to the corresponding technology specific repository.
The hint, however, is supposed to contain any further information that might be needed by the uploader to correctly upload the blob.

Example

name: image
relation: external
type: ociArtifact
version: v1.0.0
access: 
  type: maven
  repoUrl: https://maven.central.org
  coordinates: ocm/software/1.0.0/image.tar.gz
  hint: ocm.software/ocm-cli:v1.0.0

Assume, we configured exactly one uploader with the following configuration:

type: uploader.ocm.config.ocm.software                                                                          
handlers:                                                                                                       
- name: ocm/ociArtifacts                                                                                          
  artifactType: ociArtifact                                                                                     
  config:
    ociRef: https://ghcr.io/open-component-model//maven

This registers an OCI uploader for artifact type ociArtifact. As described above, consequently, the OCI uploader is called during transfer for this artifact (and all other artifacts of type ociArtifact in our hypothetical component). As a result, the image.tar.gz file in the maven package would be uploaded to https://ghcr.io/open-component-model/maven/ocm.software/ocm-cli:v1.0.0.

ISSUE 1:
The current oci uploader would attempt to check the media type of the blob described by the maven access. Since a GAVCE can always match with multiple files (e.g. based on the above example, if the maven package contains a file ocm-cli-image.tar.gz and another file ocm-website-image.tar.gz), the maven access method currently returns all blobs as tar.gz with the corresponding media type application/x-tar. If the GAVCE contains multiple files, it is necessary to create a tar-archive. But if our intention was to specify a particular file with the GAVCE - as is in our above example, where we want to specify a particular oci artifact - it is rather inconvenient that it is tar'ed, since the current implementation of the oci upload handler cannot deal with a tar.gz.tar.gz file. Of course, we could provide a special upload handler that knows how to do this, but this is inconvenient.
If we assume, we resolved above problem and return single files exactly as they are, we would still have to know the media type of the file specified by the GAVCE. Thus, an additional property mediaType is needed within the maven input and access spec.

ISSUE 2: Currently, the hint is provided by the access method. That is because the current access method might be able to provide a hint. For example, if you have a artifact with an ociArtifact access type. If you transfer the component including the resources to another registry without having uploaders registered, the access type will be changed to localBlob.

So, it will be converted from:

name: image
relation: external
type: ociArtifact
version: v1.0.0
access: 
  type: ociArtifact
  imageReference: https://ghcr.io/open-component-model/ocm/ocm-cli@<image-digest>

to:

name: image
relation: external
type: ociArtifact
version: v1.0.0
access: 
  type: localBlob
  localReference: <image-digest>
  mediaType: application/vnd.oci.image.manifest.v1+json
  referenceName: ocm/ocm-cli

Thereby, the ociArtifact access method provides the hint which is then stored in the referenceName of the localBlob access spec to be able to upload the artifact to a similiar location in another registry, if the blob would be transferred again with a oci uploader registered. Consequently, the localBlob access method would then provide the referenceName as hint during that upload.

Suggestion 1

Since the concept of uploaders is more generic than this oci use case and since the hint is independent of the access specification of the artifact, I suggest that we extend the ocm specification to make the hint an additional optional property of artifacts (thus, parallel to the type of an artifact).
Since hints are specific to the type of repository the artifact should be uploaded to or rather even specific to the type of uploader (e.g. the hint for an oci uploader likely looks different than the hint for an npm uploader). Moreover, a hint might even be specific to a certain uploader. Therefore, I suggest that the hint property should have a similiar structure as labels consisting of a name (string), value (any), and version (string).

To preempt the question why I would not add a particular label for this hint instead of adjusting the ocm specification - that is because, as mentioned above, the concept of uploaders (or blobhandlers) is an integral part of the transport process.

Suggestion 2

Since the concept described above is primarily necessary if we allow cross-consumption (so to download a blob from one type of repository, here maven, and upload it to a different type of repository, here oci), I suggest we decide not to support cross-consumption at all (at least for now). In that case, we can also omit the hint from the maven spec!

An optional mediaType property in the maven access spec and downloading particular maven files without tar.gz'ipping them would still be desirable!

Skarlso commented 1 month ago

Just for my understanding:

I suggest we decide not to support cross-consumption at all (at least for now). In that case, we can also omit the hint from the maven spec!

This would mean, that I can't download from a github repository and upload it for an OCI repository as a tarred up content, right?

Suggestion 1

I would even go as far as renaming it to something that provides a better context by name. Something like uploaderInfo or context or description or what about metadata that is generally used for additional information style data?

hint in my mind is something that would just textually be displayed in some manner providing description of some source. Not an actual instruction to be used by the uploader.

Otherwise, I'm in full support of this 👍 Nice writeup!

fabianburth commented 1 month ago

This would mean, that I can't download from a github repository and upload it for an OCI repository as a tarred up content, right?

Right!

I would even go as far as renaming it to something that provides a better context by name. Something like uploaderInfo or context or description or what about metadata that is generally used for additional information style data?

Yeah, I agree - if we were to go through with suggestion 1, we should also rename it. There is already a field for general additional information style data, the labels. I also thought about whether it would make sense to implement essentially what is suggestion 1 as label. But - as also stated above - since the transport process is such an integral part of the ocm, I thought uploaders should have a dedicated field within the spec.

Skarlso commented 1 month ago

Yah, I definitely agree to not have it as yet another label that could easily be missed.

Skarlso commented 1 month ago

Note:

Scenario 2 Is already working. However, in addition, we will allow the media type to be overwritten by the MediaType of the AccessSpec if it's set. Otherwise, the Mime type will still be used like it is now:

crane manifest ghcr.io/skarlso/maven-test-3/component-descriptors/ocm.software/demo/test-2:1.0.2
{"schemaVersion":2,"mediaType":"application/vnd.oci.image.manifest.v1+json","config":{"mediaType":"application/vnd.ocm.software.component.config.v1+json","digest":"sha256:5c2e73f0ece5566b7280af541eb0f752d4978165682f6bcd41aa59460fb148e4","size":201},"layers":[{"mediaType":"application/vnd.ocm.software.component-descriptor.v2+yaml+tar","digest":"sha256:06ecd47db00216f6ce73d34665dcab631b83be8b17d4c151c6d88c0d6b623b29","size":2560},{"mediaType":"application/x-tgz","digest":"sha256:7a9cdf674fc1703d6382f5f330b3d110ea1b512b51f1652846d9e4e8a588d766","size":9102945,"annotations":{"ocm-artifact":"[{\"kind\":\"resource\",\"identity\":{\"name\":\"mavengav\"}}]"}}]}

In the above, MediaType is set to "mediaType":"application/x-tgz" which is set by MimeType OCM figuring it out based on the extension of the file.