"dynamic multihash" - Githubissues

This is a little out there, this is born out of an attempt to generalize the type of "content-prefix-ignoring multihash function" described at the bottom of typed_protobuf.md that is required for supporting CosmosSDK chains.

It occurred to me that we may

Want to use a different base hash function than sha2-256
Want to ignore different lengths of prefixes than 32 bytes
Want to ignore suffixing bytes instead of prefixing ones

Currently this would require registering a new multicodec type for each of these multihash variations, but with the multihash proposed here it is possible to describe all of these variations with a single multihash type.

The scheme proposed here is to map the new "dynamic-multihash" type to a multihash algorithm whereby the resulting multihash will take the below form:

{dynamic-multihash-prefix}-{varuint for the multihash code for the used multihash function}-{single byte that indicates whether to trim bytes from front or back of the content}-{varuint for the length of content to trim before hashing it}-{hash of the referenced content whereby the hash function ignored X bytes from either the front or back of the byte string}

A more concrete example looks like:

0xc101-0x02-0x1013-0x00-0x01-0x20-{hash of referenced content whereby the hash function ignored the first 32 bytes of the content}

0xc101 == the proposed multihash type byte prefix for "dynamic-multihash"
0x02 == single byte stipulating how many subsequent bytes are reserved for the multihash type for the base hash function used
0x1013 == multihash type for the hash function used, in this case sha2-224
0x00 == single byte, 0 or 1, that identifies whether bytes should should be trimmed from the front (0) or end (1) of the content before hashing
0x01 == single byte stipulating how many subsequent bytes encode the number of bytes to trim from the referenced content before hashing
0x20 == number of bytes to trim from the referenced content before hashing, in this case 32 bytes

The above resultant multihash would tell us that the referenced content has been hashed using the sha2-224 function and that 32 bytes were trimmed off the front of the binary before hashing it.

One obvious issue although I'm not sure how problematic it would be in the greater system is that you can't tell by looking at the proposed multihash prefix, 0xc101, how long the prefixed content hash will be.

multiformats / multicodec

"dynamic multihash" #269