opencontainers / image-spec

OCI Image Format
https://www.opencontainers.org/
Apache License 2.0
3.34k stars 624 forks source link

standardization around compression params #1145

Closed rchincha closed 6 months ago

rchincha commented 8 months ago

For a given source tar, compression params matter even if the same compression algorithm is used. Often tar.gz layers are produced but different tools set buffer size, compression levels etc differently, so the final layer ends up being different although the source tar is the same and consequently the sha256sum.

As a result, reproducibility and deduplication suffers. Either clients use identical tooling end-to-end (which is unrealistic) or the standards evolve to encode this variability in the spec.

rchincha commented 8 months ago

https://datatracker.ietf.org/doc/html/rfc7231#section-3.1.1.1

"application/vnd.oci.image.layer.v1.tar+gzip"

-->

"application/vnd.oci.image.layer.v1.tar+gzip; param1=x; param2=y" etc?

jonjohnsonjr commented 8 months ago

There is not one standard algorithm for encoding DEFLATE. Even with the same parameters, different gzip implementations will produce different archives. If you want to record exactly how a layer was produced, the place for that would be in something like an SBOM, not in the mediaType.

sudo-bmitch commented 8 months ago

This might be another case for pushing uncompressed content in the descriptor, and compressing at the transport level.

rchincha commented 8 months ago

There is not one standard algorithm for encoding DEFLATE

Yup.

Also requiring another optional thing (SBOM) for reproducibility?

rchincha commented 6 months ago

Closing this due to the following:

1) OCI image-spec is a spec and a standard but there are limits

2) The spec itself is a lot of JSON and there is no canonical JSON to begin with

3) Output of compression is affected by the algorithm and parameters so you really have to pick and stick with a certain tool that produces the OCI layers (tar and tar.gz). Also make sure you record the tool and version used in order to reproduce the bits.