open-component-model / ocm

Open Component Model (Software Bill of Delivery) Toolset
https://ocm.software
Apache License 2.0
35 stars 23 forks source link

Callback hooks for customisations for ocm-transfer #902

Open ccwienk opened 2 months ago

ccwienk commented 2 months ago

Context / Motivation

There are cases, where customisations to OCM-CLI's ocm transfer semantics are needed, which are likely to be too special to be directly handled by OCM-CLI itself. By offering callback hooks, such usecases could be covered, with limited implementation efforts for OCM-CLI.

Use-Cases + Rationale

1. filtering of contents from OCI-Images referenced as resources

When consuming OCI Images built by third parties (e.g. from an opensource-project), those images may contain contents that are not actually required for the intended purpose (e.g. a shell for debugging). Removing such content can help to both reduce image-size, but also to reduce noise from vulnerability-scanners, and the like.

Performance Consideration

As subprocesses come with an overhead, it might be considered to have a second callback through which it can be negotiated between callback and OCM-CLI which resources to pass / not pass.

custom rewriting of access-URLs

If migrating from one repository to another (maybe motivated by deprecations like the recent one of GCR), it is of course correct that by using OCM, such a migration is quite easy w.r.t. having a means to change access-urls. However, considering large (k8s-)deployments, it may be desirable to have more fine-granular control, and only change access-URLs incrementally, over time.

Another scenario might be replication of (OCI-)resources into different target-registries

Implementation Proposal

OCI-Resource filtering

Offer a callback-hook via ARGV (e.g. --filter-resource-callback). If passed, the callback is expected to be an executable that will be run as a subprocess:

the subprocess accepts the layer-blob's octet-sequence via stdin, and outputs the (potentially modified) blob via stdout. OCM-CLI recalculates layer-digest + patches OCI-Manifest + uploads modified layer-blob. If resource is referenced via digest in component-descriptor, OCM-CLI updates image-reference(s) accordingly.

As an alternative, OCM-CLI might fully delegate replication to callback. In this case, the callback might return new access via stdout, which might already cover the second use-case. The latter variant would have the benefit to work for any resource-type (even such types not supported by OCM-CLI at all).

custom rewriting of accesses (or entire component-descriptor)

Offer a callback-hook via ARGV (e.g. --component-descriptor-callback). This callback is called before OCM-CLI started to actually replicate resources, but after OCM-CLI already created the target-component-descriptor to be uploaded (i.e. the target-component-descriptor should already have patched accesses, if the transfer-command will have an effect on them).

If passed, the given path is expected to point to an executable, which the OCM-CLI will execute as a subprocess:

The callback can choose to alter the component-descriptor (for example to change upload-targets by altering accesses). The OCM-CLI will honour the returned component-descriptor, and not apply any subsequent modifications.

In both cases, corner-cases, such as signature invalidation are to be dealt by the user specifying mentioned callbacks.

morri-son commented 2 months ago

adding @mandelsoft and @fabianburth

mandelsoft commented 1 month ago

You describe some OCI specific callback. OCM does not know anything about OCI, there this kind of special extension is not possible.

The main feature of OCM is to reliably describe and transport content and guarantee its integrity. Therefore, the proposed behaviour violates those guarantees. The transport cannot change the transferred data, it can only change its technical representation and its location. So changing the content is only possible when composing a new component version from some externally represented resources.

OCM provides a plugin concept. It especially supports the

The plugin is a simple CLI tools with dedicated commands for the various supported extension points.

mandelsoft commented 1 month ago

Modification of content should basically only be possible when creating a new component version. The CLI supports various kinds of inputshere. Inputs provide blobs from some externally accessed artifact (for example, file, directory tree, images stored in a docker daemon). Such an input can do with the artifact whatever it wants, before returning the blob finally intended to be stored in a component version.

So, when importing some externally provided content into the OCM worlds as resource in a component version. this is the right place to do desired modifications. Unfortunately inputs are an internal interface in the CLI, only; this interface is not supported by the plugin model, so far.

So, we should extend the plugin model to support inputs, also. Then it would be possible to provide an input type via plugin, which is able to access an external resource, modify it, and return the final blob to the composed component version. By applying uploaders at the same time, this blob would not be stored as localBlob, but uploaded instantaneously to a technology specific repository, again.

Like for access methods, the input type extension must also be capable to compose the input spec via command line flags. Therefore, the plugin API must consist of three CLI commands compose, validate and get:

Most of the implementation is similar to the access method commands.