multiformats / js-multiformats

Multiformats interface (multihash, multicodec, multibase and CID)
Other
235 stars 54 forks source link

compressed cid set implementation #239

Open Gozala opened 1 year ago

Gozala commented 1 year ago

I believe @mikeal has compressed CID set representation & implementation. CID sets come up all the time and we always resort to some hacky version like maps with CID keys or sorted lists to represent them.

I think it would make a lot of sense to just standardize compressed CID format and implementation. I would suggest:

  1. Implementing a codec for the compressed CID set representation and adding it here.
  2. Defining a codec format and allocating a code in the multitcodec table
rvagg commented 1 year ago

A new codec specifically for this would certainly qualify as a new codec, if it uses an novel encoding method. IIRC this is novel.

The qualification is something like: "an IPLD codec is something that potentially yields links and whose decoding method isn't already covered by an existing codec".

I guess the data model form would always be something like [&Any] - but we'd want details about the edges of that - is it nullable, is zero-length acceptable, is there a maximum, can they repeat (does it matter), etc. But a quick spec doc about that in the codecs list would be enough.

Gozala commented 1 year ago

I guess the data model form would always be something like [&Any]

I would argue we do need sets at the data model layer as well. It really is painful that one doesn't exists already, precisely because [&Any] does not say convey set semantics, in fact we find ourselves using { String: &Any } instead for that exact reason, however that isn't ideal either because how you key them becomes another question which can lead to less deterministic outcomes.