multiformats / multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
MIT License
336 stars 201 forks source link

"dynamic multihash" #269

Closed i-norden closed 2 years ago

i-norden commented 2 years ago

This is a little out there, this is born out of an attempt to generalize the type of "content-prefix-ignoring multihash function" described at the bottom of typed_protobuf.md that is required for supporting CosmosSDK chains.

It occurred to me that we may

  1. Want to use a different base hash function than sha2-256
  2. Want to ignore different lengths of prefixes than 32 bytes
  3. Want to ignore suffixing bytes instead of prefixing ones

Currently this would require registering a new multicodec type for each of these multihash variations, but with the multihash proposed here it is possible to describe all of these variations with a single multihash type.

The scheme proposed here is to map the new "dynamic-multihash" type to a multihash algorithm whereby the resulting multihash will take the below form:

{dynamic-multihash-prefix}-{varuint for the multihash code for the used multihash function}-{single byte that indicates whether to trim bytes from front or back of the content}-{varuint for the length of content to trim before hashing it}-{hash of the referenced content whereby the hash function ignored X bytes from either the front or back of the byte string}

A more concrete example looks like:

0xc101-0x02-0x1013-0x00-0x01-0x20-{hash of referenced content whereby the hash function ignored the first 32 bytes of the content}

The above resultant multihash would tell us that the referenced content has been hashed using the sha2-224 function and that 32 bytes were trimmed off the front of the binary before hashing it.

One obvious issue although I'm not sure how problematic it would be in the greater system is that you can't tell by looking at the proposed multihash prefix, 0xc101, how long the prefixed content hash will be.

i-norden commented 2 years ago

Closing this, deviates too much. Perhaps worthy of a discussion at some point?