storacha / ucanto

šŸ§ UCAN RPC
Other
50 stars 6 forks source link

Integrate IPLD schema toolchain #107

Open Gozala opened 2 years ago

Gozala commented 2 years ago

We have an IPLD schema language and JS tooling around it. We should find a way to integrate it into this library, specifically here are some ideas:

  1. A tool that can generate capability defs from the schema definition.
    • It could generate derive functions that just fail right away.
  2. Ability to generate IPLD schema from the capability definitions into a serial format.
  3. Ability to generate IPLD schema in it's JSON representation
Gozala commented 1 year ago

I did some exploration by implementing toIPLDSchema method to a Schema interface

https://github.com/web3-storage/ucanto/blob/582c4c504d2e75feee2c5d298ea08b2f01ff1c5e/packages/validator/src/schema/type.ts#L15-L31

However I run into a problem, because IPLD schemas do not support inline structs or unions which means schema like this is unable to generate a single definition

Schema.struct({
   root: Schema.string(),
   shards: Schema.dict({ value: Schema.string() })
})

In fact it needs to generate two types one for the outer struct and one for the shards field and name them.

I do not want to use UUIDs or some other random identifiers, instead I would like to use CIDs of the definition so that same schema would end up with a same name. However generating sha256 is async operation and introducing asynchrony here is not a a good idea

Instead I have been considering to use murmur3 hash (in this context we don't need cryptgraphic hashes), however I'm not sure what would be a hash collision rate.

Alternative approach might be to emit non-standard IPLD schemas and just inline structs. In theory we could still map to standard IPLD schema async by generating sh256 CIDs.

Gozala commented 1 year ago

Created an issue to lift the inline types restriction https://github.com/ipld/ipld/issues/262

gobengo commented 1 year ago

However generating sha256 is async operation and introducing asynchrony here is not a a good idea

Is this in general, or do you just mean in the WebCrypto API? e.g. I think node crypto or js-sha256 can do sync?

gobengo commented 1 year ago

wrt murmur. if there are no other options than it, I'd say if its good enough for UnixFS, it's good enough for this?

Gozala commented 1 year ago

wrt murmur. if there are no other options than it, I'd say if its good enough for UnixFS, it's good enough for this?

UnixFS hashes only directly entry names. I suspect smaller payloads are less prone to collisions, but Iā€™m not confident this assumption is accurate. More importantly thereā€™s logic in place to handle hash collisions when they occur & I can think of way to deal with them in this context.