multiformats / multicodec

Compact self-describing codecs. Save space by using predefined multicodec tables.
MIT License
340 stars 204 forks source link

Add ISCC to multicodecs table #252

Closed titusz closed 2 years ago

titusz commented 2 years ago

The ISCC is similarity preserving identifier for digital content. An ISCC is derived algorithmically from the digital content itself, just like cryptographic hashes. However, instead of using a single cryptographic hash function to identify data only, the ISCC uses a variety of algorithms to create a composite identifier that exhibits similarity-preserving properties (softhash). For an implementation of code 0xcc01 see: https://core.iscc.codes/#quick-start

rvagg commented 2 years ago

Thanks @titusz, you'll have to be patient with us over the Christmas period with some of us taking time off, and this not being a straightforward entry to consider. Feel free to ping us again here if there's been no movement by the new year though.

vmx commented 2 years ago

I had a quick look. I've seen that you already use Multibase, which is pretty cool. From the architecture diagram it looks like you have some structure of your code, which reminds of a Multihash.

CIDs are for content addressing, which isn't the case here, but rather the opposite. You don't want the identifier to change if the underlying data changes slightly. Multihashes are meant for cryptographic hashes (though perhaps we can stretch that a bit). So perhaps this is a new Multiformat on its own? How have you intended to use the 0xcc01 identifier?

titusz commented 2 years ago

@vmx I was not sure what tag/category the ISCC would fit. I was tempted to use multihash as category. But there are two problems with that:

  1. ISCC is not a cryptographic hash but a similarity preserving hash.
  2. ISCC requires its own header format with varnibble encoded type, subtype, version, length fields.

So I went for a new category that I called softhash. It could also be called fingerprint. You are right that multiformat might also fit as ISCC has its own typing system.

My idea was to make the ISCC multiformats compatible by wrapping its full byte digest with multibase, multicodec:

<multibase><multicodec><iscc-digest-including-header>

Any suggestions on how to approach this are welcome.

vmx commented 2 years ago

My idea was to make the ISCC multiformats compatible by wrapping its full byte digest with multibase, multicodec:

<multibase><multicodec><iscc-digest-including-header>

That sounds reasonable to me. Let's see what others say :)

sposth commented 2 years ago

Hello, is there a timeline for deciding on this issue?