ipld / js-ipld

The JavaScript Implementation of IPLD
https://ipld.io
MIT License
119 stars 37 forks source link

Provide API to just encode / decode data without publishing a block #194

Open Gozala opened 5 years ago

Gozala commented 5 years ago

Type:Enhancement

Severity:Other

Description:Provide an API to encode data without publishing a block

This is related to https://github.com/ipld/ipld/issues/64, more specifically I wish to encode content with arbitrary IPLD format supported, but then encrypt that buffer before publishing it. However API available right now doesn't provide a way to accomplish that because put first encodes data to buffer and then publishes it. It is also impossible to retrieve buffer for that CID as it automatically decoded with a corresponding format.

If public API was further extended with encode(data, option):Buffer that would allow one to do multiple encoding passes before block is published. Doing reverse operation would also be required to provide get that has to unwrap multiple layers.

Gozala commented 5 years ago

Since submitting I discovered that I can do following:

const cid = await ipfs.dag.put(buffer, {onlyHash:true})
const {data} = ipfs.block.get(cid)

However that is awkward, and I think something like following would make a lot more sense

const buffer = await ipfs.dag.encode(data)
const encrypted = await crypto.encrypt(secretKey, buffer)
const cid = await ipfs.dag.put(encrypted)

Then to doing revers could be something like:

const encrypted = await ipfs.dag.get(cid)
const buffer = await crypto.decrypt(secretKey, encrypted)
const data = await ipfs.dag.decode(buffer)
vmx commented 5 years ago

As you've probably seen there's a huge API rewrite, which will hopefully be merged soon.

Let me explain my current ideas about abstractions in IPLD. I see js-ipld as a library where you don't really get in touch with the binary representation. You start with your own data, hand it over to js-ipld to do some serialisation, but you don't actually care about what it does internally. You never see the binary data. When you retrieve the data, you get the already deserialised data back. So js-ipld is about structured data and CIDs.

At a lower level there's the IPLD Formats (which btw will also see an API rewrite soon). There you work on a lower level. Here it's about structured data and their serialisation/binary encoding.

So currently (as you found out), you would use the Block API from IPFS to work on a block/binary level.

I need to put more though into this. But I think it would be great if we could solve this on an IPLD Format level.

So it might look this (based on the new API):

const cid = await ipld.put([data], format: multicodec.enrypted, { key: secretKey })
// the reverse
const data = await ipld.get([cid], { key: secretKey })

And the actual encryption would be handled within it's special IPLD Format.

vmx commented 5 years ago

Funnily enough I was working on something today that also needed the serialsed data without storing it. After a quick chat with @mikeal I think that we might need some layer between what IPLD Formats and js-ipld is today. js-ipld would then use that layer to store the blocks.

@Gozala give me a bit of time on this (I'm not sure if I find the time this week and next week I'll be at a conference). I want to think more about the layers we need to also support https://github.com/ipld/ipld/issues/64. I like the idea of the encode/decode() step.

Gozala commented 5 years ago

@vmx I have posted feedback on new API in this post https://gozala.hashbase.io/posts/Constraints%20of%20an%20API%20design/

Gozala commented 5 years ago

I need to put more though into this. But I think it would be great if we could solve this on an IPLD Format level.

So it might look this (based on the new API):

const cid = await ipld.put([data], format: multicodec.enrypted, { key: secretKey })
// the reverse
const data = await ipld.get([cid], { key: secretKey })

And the actual encryption would be handled within it's special IPLD Format.

I think solution should be somewhat more general. What I mean is there might be several layers of encoding JSON -> dag-cbor -> symetric encryption -> asymetric encryption .... and somehow each pass needs to encode information + metadata about codec so when you do reverse you can decode each layer.

The problem I'm running into right now (unless I'm missing something) e.g when I encode with dag-cbor and say encrypt that with secret key. Knowledge of encoding 'dag-cbor' is encoded in CID not the buffer itself which means that when doing decoding after decryption step I no longer know what is the codec / format to be used for next decode step.

Which in some ways suggests there should be a need for some canonical registry for codecs.

Gozala commented 5 years ago

Ok ignore all that above clearly that's already being considered https://github.com/multiformats/multicodec

mikeal commented 5 years ago

This should be resolved once we migrate to https://github.com/ipld/js-ipld-stack as you can create all the block data lazily without publishing it and it does all the codec lookup for you still.