open-component-model / ocm-project

OCM Project Backlog
Apache License 2.0
0 stars 0 forks source link

Layer Length Restriction #12

Open mandelsoft opened 8 months ago

mandelsoft commented 8 months ago

In OCI-based OCM repositories local blobs are stored as layers of the OCI artifact used to represent the OCM component version.

Typically, there are size limitations for layer blobs (10GB for google). Because images, which are transported via local artifacts are represented as tar archives, they contain

This could hit the size limitation.

A solution would be to extend the local access method definition on OCI based OCM repository implementations to support splitting the blob into multiple layers.

For example:

related to #36

fabianburth commented 6 months ago

Gardener is planning to put all variants of Kubernetes Node Images into a single oci index artifact. This will most likely hit this boundary with an increasing number of flavours that should be handled this way.

Therefore, we should bump up the priority of this issue.

mandelsoft commented 1 day ago

he basic idea would be to allow a list of blob digests in the localReference field of the localBlob access method implementation for the OCI repository type.

The AddBlob implementation of the OCI-based repository implementation delivers an Access Specification, and could therefore extended to split large blobs and store them in multiple layers. The list is then returned as part of the Access Specification.

Therefore the access method implementation has to be adopted accordingly to yield and accept a list of blob hashes. Additionally, the layer cleanup functionality has be adjusted to take the list into account.

The problem is the BlobAccess interface and the fact that the blob size may not be known in advance. Because a blob access is a factory multiple readers and it can be kept and copied, a pure sequential handling (for example with a cucked reader) is not possible.

Therefore, we have to use local caching of content:

  1. Blob Size is known smaller than chunk size -> use original blob otherwise -> 2 2) The split is always done sequentially, therefore a cached blob access can be created based on a chunked reader one after the other. Once an access has been created (and filled) is can be used to create the layer. If the chunked reader is not yet done, the next chunk is processed the same way