opencontainers / image-spec

OCI Image Format
https://www.opencontainers.org/
Apache License 2.0
3.54k stars 659 forks source link

Consider adding Blake3 to registered types #819

Open sargun opened 3 years ago

sargun commented 3 years ago

I think we should consider adding blake3 to the registered types (https://github.com/opencontainers/image-spec/blob/79b036d80240ae530a8de15e1d21c7ab9292c693/descriptor.md#registered-algorithms). I propose the prefix b3-256.

cyphar commented 3 years ago

We can add support for it (though b3 has many configurable hash lengths -- how do we pick the best one?), but I do think we also need to work on making sure that tooling which generates images supports mixed-digest images (and maybe that should be done first). It's not just a matter of being able to handle different hashing algorithms, it's that you need to have a way of either regenerating all hashes in an image tree with different digest algorithms or otherwise determining what is the best hash algorithm to use given an existing image that you're creating a modified version of.

For instance, given a descriptor tree full of SHA-256 digests you probably want to use SHA-256, but if there's a SHA-512 in there which should you use for new digests when creating a new image? Maybe you want to opt for the newest algorithm in use by an image digest tree? Either way I think most tools already don't handle SHA-512 in the most ideal way, so we need to work on that.

And finally it would be nice if we had a way of not having to duplicate the same underlying data if the hashing algorithm referring to it is different (layers being the most important example of this). But then again, tools can also be sufficiently clever about this (store an out-of-spec lookup table which knows the hash of each object under multiple hashing algorithms).

cyphar commented 3 years ago

Also this will need probably need a https://github.com/opencontainers/go-digest PR first?

sargun commented 3 years ago

There seemed to be a circular nature between go-digest and image-spec. Mostly I'm starting the issue so I can get blake3 added to go-digest, but the process by which you get new hashes added to go-digest is to first add get them registered here.

I also wanted to start the conversation of what the prefix should be. I suggested b3-256 in my initial issue description, because b3 = blake3, 256-bit, where 256 bits is the default b3sum output.

sargun commented 3 years ago

I added a proposal to add blake3, where the default hash length is 256-bit as suggested in the spec.

rchincha commented 5 months ago

Previously, the motivation to add blake3 was not clear. blake3 is very fast even on generic h/w and that should definitely help with large artifact production and verification.

rchincha commented 4 months ago

quick test 20 vCPU (Intel Xeon Gold 5218)

blake3 (parallel, so uses all available cores)

$ time b3sum test.img (100GiB) 253da5f3c5802f7a2c30b16a29ae4aa3830be8d5be57e31778087ea060f7def9 test.img real 0m48.493s

sha256sum (can only use single core)

$ time sha256sum test.img
f0b14a8da7f1c48a0846647a078b97956edd8df451a62fc4b466879aa24d4fd7 test.img real 10m49.152s

rchincha commented 4 months ago

https://github.com/opencontainers/go-digest/issues/104

shizhMSFT commented 3 months ago

Switching to blake3 is indeed faster. I built a PoC of the oras tool, which uses blake3 by default (the PoC release v1.2.0-blake3 can be downloaded at here).

Here's a test run for oras pushing an 10G file to a local folder in OCI image layout.

$ truncate -s 10G 10G.bin
$ time oras push --oci-layout test-sha256:10G 10G.bin
✓ Uploaded  10G.bin                                                                              10/10 GB 100.00%    36s
  └─ sha256:732377e7f4a2abdc13ddfa1eb4c9c497fd2a2b294674d056cf51581b47dd586d
✓ Uploaded  application/vnd.oci.empty.v1+json                                                      2/2  B 100.00%    2ms
  └─ sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
✓ Uploaded  application/vnd.oci.image.manifest.v1+json                                         595/595  B 100.00%    4ms
  └─ sha256:58f03d65ab562e6905c10e26cc9c48b8c95ac8d6db3b3ceb3d860fc2321f5848
Pushed [oci-layout] test-sha256:10G
ArtifactType: application/vnd.unknown.artifact.v1
Digest: sha256:58f03d65ab562e6905c10e26cc9c48b8c95ac8d6db3b3ceb3d860fc2321f5848

real    0m59.275s
user    0m44.369s
sys     0m20.767s
$ time oras_blake3 push --oci-layout test-blake3:10G 10G.bin
✓ Uploaded  10G.bin                                                                              10/10 GB 100.00%     9s
  └─ blake3:28960eef7d587ab6d1627b7efe30c7a07ce2dce4871d339fdfb607cb0776e064
✓ Uploaded  application/vnd.oci.empty.v1+json                                                      2/2  B 100.00%    2ms
  └─ sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
✓ Uploaded  application/vnd.oci.image.manifest.v1+json                                         595/595  B 100.00%     0s
  └─ blake3:cbee086b764e6912688269c2fdf2db8a454e0e07dd39c5601a7db1a79bd247a4
Pushed [oci-layout] test-blake3:10G
ArtifactType: application/vnd.unknown.artifact.v1
Digest: blake3:cbee086b764e6912688269c2fdf2db8a454e0e07dd39c5601a7db1a79bd247a4

real    0m12.392s
user    0m5.798s
sys     0m8.748s

As you can observe, blake3 is roughly 4x ~ 5x faster than sha256.

shizhMSFT commented 3 months ago

Although blake3 is faster, the blake3 is not a NIST approved algorithm. Therefore, I have concerns to use blake3 in FIPS scenarios as well as signing scenarios.

shizhMSFT commented 3 months ago

Note that the above test with the modified oras tool uses the blake3 implementation referenced in opencontainers/go-digest. That is, zeebo/blake3.

However, unlike the upstream rust implementation, the zeebo/blake3 implementation does not support multi-threading.

rchincha commented 3 months ago

@shizhMSFT thanks for the experiment. This is another great data point.