opencontainers / go-digest

Common digest package used across the container ecosystem
https://www.opencontainers.org/
Other
185 stars 60 forks source link

Using multihash for digests #45

Open hinshun opened 5 years ago

hinshun commented 5 years ago

The descriptors spec mentions multihash+base58 as an example of an algorithm that is not registered. I'm interested to use multihashes in application/vnd.oci.image.manifest.v1+json, and to pull tags by multihash digest from distribution.

Multihashes encode the algorithm in the digest, so multihash+base58 itself isn't sufficient to generate digests (FromReader, FromBytes, FromString) or to return the size of the hash. Perhaps multihash+base58:encoded are transformed to its respective algorithm:encoded for validation / verifying purposes.

I'm curious to hear thoughts about supporting multihash (in or out of package) in a way that projects like containerd and distribution can consume them.

AkihiroSuda commented 4 years ago

Do you already have implementation that uses multihash+base58 with OCI descriptors?

hinshun commented 4 years ago

Hi @AkihiroSuda, in my proof of concept that transfers layers over IPFS we continue to use sha256:xyz OCI descriptors. Here's the problem with multihash+base58:

In IPCS we continue to use OCI descriptors: https://github.com/hinshun/ipcs/blob/master/converter.go#L45-L48

However the digest isn't referring to the layer digest, but rather the wrapper object (known as IPLD) around the layer. The reason why its needed its because in IPFS, large objects needs to chunked into a DAG. The wrapper objects are the overhead of storing the structure of the DAG as well as the digest of each chunk.

thaJeztah commented 4 years ago

@hinshun just a heads-up: we're preparing tagging of v1.0.0 (https://github.com/opencontainers/go-digest/pull/56). Given that no PR was opened yet, I assume this is ok to be implemented as a minor release after v1.0.0?

hinshun commented 4 years ago

@thaJeztah Yes, I think we'll need to find another approach after v1.0.0. Multihash is an alternative to go-digest, rather than an algorithm that can be added. For example, combining things like +base58 is in the spec for multibase.

Perhaps the best way to implement this is just as annotations.

thaJeztah commented 4 years ago

Ah, thanks; though I'd ping you just in case there was a blocker for the release 👍

stevvooe commented 4 years ago

The example referenced in the spec is just an example of an unregistered algorithm. Typically, the first part is the algorithm and the second part is some modifier describing the details of the encoding. There's no reason we couldn't store a multihash in a go-digest string. The only constraint is that the encoding has to be URL safe (https://github.com/opencontainers/go-digest/blob/master/digest.go#L63).

Seems like multibase would be sufficient as a prefix, since the rest of the decoding information is embedded in the string.

One thing to consider here: it is not necessarily better to support a whole slew of algorithms, since it makes sets of content mutually incompatible without more intelligent backend support. Ideally, you choose one algorithm that is "canonical" then have conversions to/from that main one. Annotations, as you mentioned earlier, might be sufficient for packing images that were built by docker then pushed into the ipcs context.

One other thing we might consider: the use of Algorithm.Size() assumes a fixed size output encoding. I can't remember why this is there, other than being compatible with Go's crypto package, which assumes fixed size hash outputs, so if that is an issue, we might be able to deprecate it.

hinshun commented 4 years ago

The example referenced in the spec is just an example of an unregistered algorithm. Typically, the first part is the algorithm and the second part is some modifier describing the details of the encoding. There's no reason we couldn't store a multihash in a go-digest string. The only constraint is that the encoding has to be URL safe (https://github.com/opencontainers/go-digest/blob/master/digest.go#L63).

Yeah, I think the spec doesn't need to be changed. The existing implementation of go-digest makes a few assumptions like fixed size issue you pointed out. We'll need to fix those before we can add algorithms that have decoding information.

Seems like multibase would be sufficient as a prefix, since the rest of the decoding information is embedded in the string.

If we're going this route, cidv1 seems to be best prefix. I think there is a few more bits of information that IPFS uses than just the base-encoding. It also allows IPFS to upgrade (like going from cidv0 to cidv1) but still maintain backwards compat.

One thing to consider here: it is not necessarily better to support a whole slew of algorithms, since it makes sets of content mutually incompatible without more intelligent backend support. Ideally, you choose one algorithm that is "canonical" then have conversions to/from that main one.

Good point. The IPFS team actually is moving away from storing content using cid to multihash so that the base encoding doesn't make sets of content incompatible. Though they will still suffer from duplicating content that were stored using different algorithms.