Encryption layer for IPLD

Gozala commented 5 years ago

This is somewhat relates to #63 as it could be an alternative or one could be enabled by the other. At the moment with IPLD all the links are public even if content it links to isn't. However as I pointed out in #63 case could be made that one might want make conceal links and make them only available to selected participants (with whom corresponding keys were shared).

I think it is important to consider this in relation to GraphSync and IPLD Selectors as it would be a shame if peers participating in exchange that happen to have shared key for concealed links were required to do multiple round-trips for data exchange that would defeat the benefit of GraphSync.

mikeal commented 5 years ago

I’ve thought about this a bit and a few quick things to note:

You don’t want to entirely conceal the links. If you don’t provide the links in a un-encrypted way, you would have to share the key with any system or provider that you’d want to have store the graph, because a replicator is handed a root node to replicate (and possibly pin) and needs to be able to walk the graph in order to store every block. Instead, what you want to do is provide a list of links that exist in the encrypted block but without any other information about them (like the map key names). The hashes alone don’t provide you any sensitive information if what they are linking to is also encrypted.
- This does leave open the possibility that you could link to data that is not encrypted and that would share sensitive information you may not have wanted exposed about the encrypted block. It will have to be up to the encryption programs to solve this by parsing through an entire graph and encrypting each node if it does not want to expose this information.
In the future, I’d like to see us just link to the implementation of the decryptor (in WebAssembly). In the interim, we’ll want encrypted nodes to be self-describing but in the future I’d like to see them be self-implementing.
A graph could have nodes encoded with a variety of different keys and even encryption programs. So, not just for selectors but for basic path traversal, we’ll need the ability to dynamically aquire keys for encrypted blocks.

So, an encryption program would do something like:

Parse out all the links.
Encode the block data.
Create a new block with: the encrypted data, an un-encrypted list of the links, and information about the encryption settings.

In the short term, this would be something like:

{ type: ‘encrypted’,
  crypto: { toPublicKey, fromPublicKey, algorithm, settings },
  links: [ CID ], /* optional, some blocks will not contain links */
  data: Buffer /* original block data after encryption */
}

This is just a sketch, there’s probably something a bit more elegant we can do with the schema stuff @warpfork has done. But the place I’d like to get in the future once we can take advantage of WebAssembly is something like this:

{ crypto: [ CID /* link to the WebAssembly program */, [ toPublicKey, fromPublicKey ]],
   data: Buffer,
   links: [] /* optional */
}

Gozala commented 5 years ago

You don’t want to entirely conceal the links. If you don’t provide the links in a un-encrypted way, you would have to share the key with any system or provider that you’d want to have store the graph, because a replicator is handed a root node to replicate (and possibly pin) and needs to be able to walk the graph in order to store every block.

That is actually the goal. I want two layers of encryption / access:

First key used to encrypt actual data meant for recipients.
Second key used to encrypt links of the graph this one is meant for replicators.

This way you could elect specific replicator to replicate data without accessing data itself without having to build second graph of data blocks.

Instead, what you want to do is provide a list of links that exist in the encrypted block but without any other information about them (like the map key names). The hashes alone don’t provide you any sensitive information if what they are linking to is also encrypted.

I can see only one advantage of doing it this way - which is it would not reveal order in which data blocks were added but I'm not sure that in itself provides enough benefit to deal with the fact that it would require syncing graph with all the list.

2. In the future, I’d like to see us just link to the implementation of the decryptor (in WebAssembly). In the interim, we’ll want encrypted nodes to be self-describing but in the future I’d like to see them be self-implementing.

👍 That sounds great!

3. A graph could have nodes encoded with a variety of different keys and even encryption programs. So, not just for selectors but for basic path traversal, we’ll need the ability to dynamically aquire keys for encrypted blocks.

What about the keys needed to do actual decryption ?

Gozala commented 5 years ago

What about the keys needed to do actual decryption ?

Never mind. In my head I was still thinking of two layers of encryption which is not what you're suggesting so this is probably irrelevant.

Gozala commented 5 years ago

Instead, what you want to do is provide a list of links that exist in the encrypted block but without any other information about them (like the map key names). The hashes alone don’t provide you any sensitive information if what they are linking to is also encrypted.

I can see only one advantage of doing it this way - which is it would not reveal order in which data blocks were added but I'm not sure that in itself provides enough benefit to deal with the fact that it would require syncing graph with all the list.

I'm also realizing here that I'm biased towards the use case I've being thinking of - that is linked data feed, which is more of linked list than a tree, which is why I'm not concerned with a link names because they always just point to the tail of the list. If you do consider graph then concealing link names start to matter.

mikeal commented 5 years ago

I'm also realizing here that I'm biased towards the use case I've being thinking of - that is linked data feed, which is more of linked list than a tree, which is why I'm not concerned with a link names because they always just point to the tail of the list. If you do consider graph then concealing link names start to matter.

Yup. Also, keep in mind that node decryption is atomic, the decryptor is only ever concerned with a single block. This means that, even with a single layer of encryption, the links (both plain text and encrypted) are the same and they link to, presumably, blocks that are also encrypted, but the traverser doesn’t even know they are encrypted until it hits the next block. In other words, there will be no references in either encrypted or unencrypted data to the original unencrypted CID’s.

The only thing this method allows someone to see without decryption keys is the shape of the graph. With enough modeling you could actually start to make assertions about the data just from the shape. However, this is easily overcome if we continue to do everything in IPLD in a block agnostic way (using only paths and selectors) because an encryption program could take a graph and produce an new graph at the block layer with identical graph information as far as IPLD paths and selectors are concerned, effectively obfuscating the shape of the data from the shape of the visible graph to replicators.

Gozala commented 5 years ago

I made some progress on my encrypted data feeds that attempted to incorporate suggestions made here. There are few things I learned in the process that I would like to share / get feedback on:

We talked about encryptor / decryptor here, but in the process I've that it's more generic, more like encoder / decoder. Here is the interface I end up with:
```
export type Head<a> = {
signature: Signature<Encoded<Block<a>, ReplicatorKey>, AuthorPrivateKey>,
block: Encoded<Block<a>, ReplicatorKey>
}
export type Block<a> = {
links: CID<Head<a>>[],
message: Encoded<Message<a>, SubscriberKey>
}
export type Message<a> = {
previous: CID<Head<a>>,
size: number,
content: a
}
```
Here is what's going on:
- Content of the message in the feed a is encoded (but not published) as Message<a>.
- Message<a> then is encoded to a buffer, and prefixed with codec / format name via multicodec corresponding to encoding used.
- Encoded & prefixed Message<a> is then encrypted according to access policies.
- Encoded, prefixed and encrypted blob is used as message in Block<a> Dag node.
- Block<a> then is encoded to a buffer, and prefixed with codec / format name using multicodec that corresponds to encoding used.
- Encoded & prefixed Blob<a> is then encrypted according to access policies.
- Encoded, prefixed and encrypted blob is used as block in Head<a> Dag node.
- Head<a> is then published and IPNS record is updated to point to it.
All this was to say that block goes through multiple encoding phases and only few of them are encryption. Which is to suggest I think there needs to be generic approach for
- multilevel encoding
- codec information needs to be captured (otherwise decoder need to know codec composition of codecs which is only possible in monophonic data, in polymorphic data additional hints are required)

The reason I end up needing to wrap message in block is so to have multiple roles for access control - Replicator can walk the chain, subscriber can additionally read messages. In fact there is yet another layer where a can be public message or secret message to a specific recipients:

export type PrivateMessage<a> = {
type: "private",
head: SecretPublicKey,
// scalar multiplication is used to derive a shared secret for each recipient
// which is then used as to encrypt a `BodyKey` for each recepient.
// Each recepient will attempt to decode `BodyKey` by dervining shared secret
// using `SercetPublicKey` (in head attribute) and own private key. If
// successuful, recepient can decrypt content with it.
// -----
// Unlike SSB this doesn't actually attempts to conceal number of recepients
// which is not impossible just easier to do with raw buffers than with DAGs.
secrets: Encoded<BodyKey, SecretKey<SecretPrivateKey, RecepientKey>>[],
content: Encoded<a, BodyKey>
}

export type PublicMessage<a> = {
type: "public",
body: a
}

With that I'm inclined to think that ideally IPLD should be able to accept some token as paramater to .get and return different shaped node based on the access policies that token provides. That way:

replicator could just get set of links to replicate.
subscriber could get only public messages.
specific recipient could get public messages & messages addressed to it.

Privacy VS accountability - As per suggestions here I attempted to conceal shape of the feed, however in case of data feed it doesn't actually provide any more privacy as shape is pretty obvious given that it's always just a linked list (maybe adding some noise by adding unnecessary nodes could help a bit, but not sure it's really buys much). On the flip side I would like to make it impossible to publish new head to the IPNS that does not contain previous head in the chain, which is only possible if chain isn't concealed. In that regard concealing shape is actually a counter productive.
I end up hacking my way around the fact that there is no way to just encode / decode data without publishing / fetching it. I think it would make far more sense to change API so that it encode / decode & get / put take / return encoded node. Mostly because as I've tried to illustrate node can go multiple phases of encoding and assumption that you fully want to decode node doesn't necessarily hold even if all the metadata was included into the blob that would allowed that.

/cc @vmx

rvagg commented 5 years ago

@Gozala the double encryption here is so that the data is completely obscured to the public but a replicator can access the links that need to be replicated but can't access the unencrypted data, right? What purpose is the signature serving here?

Gozala commented 5 years ago

@Gozala the double encryption here is so that the data is completely obscured to the public but a replicator can access the links that need to be replicated but can't access the unencrypted data, right?

Exactly!

What purpose is the signature serving here?

Signatures allow consumers to verify that feed is updated by an author (owner of the feed private key) and that feed is linear (does not fork). It is important in the context where feed represents OPs of the CRDT (which is how I indent to use it with https://github.com/automerge/hypermerge).

Gozala commented 5 years ago

Few more thoughts:

IPLD encodes codec info into CID, when Dag node points to it that presents enough info to a resolver as from the link it can figure out what codec to use. In my use case however I don't want data to be available in non-encrypted form, there for I need to encode codec info into Dag node itself. I am starting to think about this as an inline node. Encrypted message can be represented with "dag-secretbox" Dag node which contains link to inline node representing message encoded in e.g. "dag-cbor" encoding.

I like idea of inline nodes as it would make content encoded / encrypted several times explorable through IPLD Explorer
It would make sense to have standardized way for expressing paramaterized IPLD path, such that it can be naturally supported by IPLD Explorer. So in case of encrypted message represented with dag-secretbox it should be possible to express path to the decoded content that cuts through encryption layers. It also should be possible for a node to require multiple parameters (e.g nonce & secretKey). To make it possible I propose idea of "query links" that is extend IPLD resolver spec to allow it provide links that represent parameters that need to be provided. This would not only allow passing parameters under IPLD path but also allow IPLD Explorer to generate input fields for required parameters.

In the example below one could access last message of the feed through a following path:

/${headCID}/block/${replicator}/message/${subscriber}/content

// Assume promise based API instead of callback base one
const Seretbox = {
  multicodec:"dag-secretbox",
  util: {
    async serialize({message, nonce, key}) {
      return nacl.secretbox(message, nonce, key)
    },
    async deserialize(box, [nonce, key]) {
      return {
        message: nacl.secretbox.open(box, nonce, key),
        nonce,
        key
      }
    }
    async cid(node, options) {
      const hashAlg = options.hashAlg || resolver.defaultHashAlg
      const version = typeof options.version === 'undefined' ? 1 : options.version
      const box = await Seretbox.util.serialize(node)
      const hash = await multihashing(box, hashAlg)
      return new CID(version, Seretbox.multicodec, hash)
    }
  },
  resolver: {
    async resolve(blob, path) {
      const [root, ...params] = path.split("/")
      switch (params.length) {
        case 0:
          return ["/nonce?/key?"]
        case 1:
          return ["/key?"]
        case 2:
          return {
            value: await Seretbox.deserialize(blob, [nonce, key]),
            remainderPath: ""
          }
        default:
          throw new Error('path out of scope')
      }
    }
    async tree(blob) {
      return ["/nonce?/key?"]
    }
  }
}

ipld.support.add(Seretbox.multicodec, Seretbox.resolver, Seretbox.util)

const publish = async (feed, data) => {
  // dag.inline encodes node with a given coder and prefixes it with codec info
  const inlineMessage = await dag.inline({
    previous: feed.head,
    size: feed.size + 1,
    content: content
  }, "dag-cbor")

  const message = await dag.put({
    nonce:feed.subscriber.nonce,
    key: feed.subscriber.secretKey,
    message: inlineMessage
  }, "dag-secretbox")

  const inlineBlock = await dag.inline({
    links:[feed.headCID, message],
    message
  }, "dag-cbor")

  const block = await dag.put({
    nonce:feed.replicator.nonce,
    key:feed.replicator.secretKey,
    message: inlineBlock
  }, "dag-secretbox")
  const signature = feed.author.sign(secretBlock)

  const head = await dag.put({ block, signature }, "dag-cbor")

  return {...feed, head, size: feed.size + 1 }
}

const last = async (feed, n) => {
  const replecator = `${feed.replicator.key}/${feed.replicator.secretKey}`
  const subscriber = `${feed.subscriber.key}/${feed.subscriber.secretKey}`
  const path = `/block/${replicator}/message/${subscriber}/content`
  return await dag.get(feed.head, path)
}

rvagg commented 5 years ago

Nice, it might make sense to step back from the current selector conceptualisation and use the IR-style that's developing @ https://github.com/ipld/specs/pull/95. It's got enough expressiveness to build in the kinds of parameters needed to transparently traverse encrypted blocks, including IVs/nonces and whatever else might be needed for a given encryption scheme.

 {
  "cidRootedSelector": {
    "root": "cidabcdef",
    "selectors": [
      {"selectPath": "message", "key": "replicatorKeyHere"},
      {"selectPath": "content", "key": "subscriberKeyHere", "iv": "nonce"},
    ]
  }
}

Traversal involving encryption boxed blocks would just skip through them transparently. Whether or not there is a need to have a human-readable form of this and what that would look like could be deferred till later.

Gozala commented 5 years ago

Whether or not there is a need to have a human-readable form of this and what that would look like could be deferred till later.

Have not had a chance to look at the spec yet, but generally you can’t always defer humane-readablity as without that as a design constraint you may end up with a solution that doesn’t necessarily permit it or feels like a clunky afterthought.

I’ll read through spec when I get a chance and provide more constructive feedback afterwards.

mikeal commented 5 years ago

IPLD encodes codec info into CID, when Dag node points to it that presents enough info to a resolver as from the link it can figure out what codec to use. In my use case however I don't want data to be available in non-encrypted form, there for I need to encode codec info into Dag node itself. I am starting to think about this as an inline node. Encrypted message can be represented with "dag-secretbox" Dag node which contains link to inline node representing message encoded in e.g. "dag-cbor" encoding.

I would argue that you do still want to encode the data with a specific codec. You want to put enough information in the block that a decryption program can figure out what key it needs to decrypt it. There isn’t enough information in the CID to do this.

One of the principals in IPLD is to be “self describing.” By this, we mean that data should carry all the information necessary to interpret it without outside knowledge. If you had a block without a codec, effectively a raw block, then no IPLD code will know what to do with it unless you specifically say “oh, i happen to know that this is an encrypted block.”

Let me try saying this another way, in terms of layers.

The “Block” is basically the lowest layer in the stack. It’s just a chunk of binary data, a matching hash, and a reference to a codec in order to interpret it. It’s important to note that even at the lowest layer we’ve encoded enough information in the Block to interpret it up to a point. If there is more information we need in order to further interpret the data then it should live in that decoded data.

As you go a layer up the stack, for this encryption case I’d say we should just move directly to the IPLD Data Model, we have a set of types we support when decoding the block using a given codec. This is where I would implement encryption, and this is also where I think you need to make sure that enough information is encoded in plain text to know:

This is an encrypted block.
The information you would need in order to lookup decryption keys.

From there, you can build a self-describing encryption format on the IPLD Data Model rather than at the Block layer.

818410DC-C5A0-405E-BDC0-4678F88DDFD2

Gozala commented 5 years ago

@mikeal I think you may be misunderstanding what I was trying to say in quoted message. I do agree on the proposed layering. And agree that dag-secretbox should encode info it needs to decode the message. What I think you're missing from my message is following:

There will be codecs that are more of a "transcoders" if you will. It takes data in some format say encoded in "dag-cbor" and encrypts it. The problem is there is no standard way to pass in encoded data without loosing information about the format. Sure you can do it off the band meaning my "dag-secretbox" may take node blocks like {data, format, nonce, key} so that during decode it can first encode data in a given format and then on decode use that format info to do the reverse. However my argument is it's better to not constraint "transcoder" like "dag-secretbox" with that - meaning forcing it to do encode(data, format) / decode(data, format), but rather pass something like an "inline link" that is like blob: URL for CIDs, that way you would allow block to link to other blocks that are either inlined or not (in later case have CIDs).

mikeal commented 5 years ago

There will be codecs that are more of a "transcoders" if you will. It takes data in some format say encoded in "dag-cbor" and encrypts it. The problem is there is no standard way to pass in encoded data without loosing information about the format.

Why not just require that it be the same decoder?

The reason the CID has all this information is so that you can link from one block to another and know how to interpret it. If the data is already in the block then just require it be using the same encoder, it’s not as though you’re pointing to an external reference.

If a block is encoded in dag-cbor and has information that tells us “I’m an encrypted node” then we will decrypt the binary data and interpret it as dag-cbor.

let container = {
 _encryption: { nonce, publicKey, algo }
 _data: encrypt(dagCbor.encode({ foo: “i’m secret encrypted data” }, nonce, algo, privateKey))
}
let buffer = dagCbor.encode(container)
let block = new Block.from(buffer, ‘dag-cbor’) // or something, we are still debating this API

An implementation of a path traversal would have code in it that looked like this

const decryptNode = async (node, format) => {
   let decrypt = findDecryptor(node._encryption.algo)
   let key = findPrivateKey(node._encryption.publicKey)
   let decode = findDecode(format)
   return decode(await decrypt(node._data, key, node._encryption.nonce))
}
const resolve = async (path, block) {
  if (!Array.isArray(path)) path = path.split(‘/‘).filter(x => x)
  let node = await block.decode() // still discussing this API, but the more I look at it the more i like it
  if (node._encryption) {
     node = decryptNode(node, block.format)
  }
  let p = path.shift()
  while (path.length) {
     if (node[p] === undefined) throw new Error(‘Not Found’)
     node = node[p]
     if (CID.isCID(node)) return {value: node, remaining: path.join(‘/‘)}
  }
  return {value: node}
}

Gozala commented 5 years ago

There will be codecs that are more of a "transcoders" if you will. It takes data in some format say encoded in "dag-cbor" and encrypts it. The problem is there is no standard way to pass in encoded data without loosing information about the format.

Why not just require that it be the same decoder?

Because then I need dag-cbor-secretbox, dag-pb-secretbox, etc...

Gozala commented 5 years ago

let container = {
 _encryption: { nonce, publicKey, algo }
 _data: encrypt(dagCbor.encode({ foo: “i’m secret encrypted data” }, nonce, algo, privateKey))
}
let buffer = dagCbor.encode(container)
let block = new Block.from(buffer, ‘dag-cbor’) // or something, we are still debating this API

Ok so you're creating a requirement that wrapper was created encoded in the same encoding as data that was encrypted. You could do that but I think that is a bad requirement to have what if message at hand is git object or something even more exotic it seems strange to force wrapper to have same encoding.

Gozala commented 5 years ago

It's not that it's not doable, I'm already doing it by using multicodec and prefixing encoded bytes before encryption (which also hides format that your proposed solution doesn't) and on decode I find corresponding decoder to decode decrypted bytes. However that introduces incidental complexity - that is dag-secretbox needs to know the format of the message, hence my argument it would be better if it did not have to. Which would be trivial to do by allowing inline links and all the codecs will become free of that concern.

Additional benefit would be it would allow freedom of data layout in the block, so you could actually represent things like this in IPLD block

Where messages can be in arbitrary format.

Gozala commented 5 years ago

@mikeal also worth mentioning that your proposed solution works with one layer of encryption, but what if you have multiple layers that you have a problem.

mikeal commented 5 years ago

Because then I need dag-cbor-secretbox, dag-pb-secretbox, etc...

Why? They are just normal dag nodes with the “secretbox” information encoded into them.

My point is, anything that is a valid dag-cbor block should always just have a dag-cbor codec. We can put information inside the node that tells us about the encryption and contains the payload. We can even do this in a codec agnostic way because if we rely on the IPLD Data Model this will work on any codec that supports the Data Model. CID codecs are not mime types, and I actually tried to make them that when I first started learning this stack, and it wasn’t until we locked in the Data Model that I saw how we could build anything that would require new mime types in order to self-describe how to interpret them.

We only need to know special information about the encrypted payload when we read the data in block, and that happens at a layer above the Block level. We can modify the Selector and Path specifications to be aware of information we encode at the data model layer. We already have to do this for hamt and other collections because a single namespace is actually spread out over many blocks in a more advanced data structure, so the path will not have a one-to-one mappings with node properties. The only additional layer of difficulty encryption poses is that we have to find a way to lookup and the decryption keys, which I’m not yet proposing a solution to (I think you mentioned putting them in the selector at some point, I’ve been assuming some sort of key-store we attach to a selector engine for lookups, but either is fine and these aren’t mutually exclusive).

I think this is hard to see right now because of the current state of IPLD. We have a lot of working code at the Block level and for very basic path resolution but we’ve just built a basic selector engine and haven’t implemented any of the dynamic support for collections I’m mentioning above, this is all just planned. So, I can see why you’d want to do this at the Block layer in order to get something working in the short term.

also worth mentioning that your proposed solution works with one layer of encryption, but what if you have multiple layers that you have a problem.

ok, then:

while (node._encryption) {
   node = await decryptNode(node, block.format)
}

mikeal commented 5 years ago

Ok so you're creating a requirement that wrapper was created encoded in the same encoding as data that was encrypted. You could do that but I think that is a bad requirement to have what if message at hand is git object or something even more exotic it seems strange to force wrapper to have same encoding.

True, I guess this exposes a flaw in our mental model when it comes to supporting content addressed data that doesn’t support the Data Model. We have been assuming that when linking to systems that already exist we would have to use a reference that is publicly available in order to potentially do content discovery in another system. It hadn’t really occurred to me that you would take data from another system, encrypt it in an IPLD system and then move that data around in the IPLD system. It also doesn’t help that most of our use cases for this have been blockchains where any encryption of the underlying data is already done underneath the data we’re getting a reference to.

Let’s explore this a little further. Is the fact that you’re encoding git data sensitive as well? In other words, if we were to encode a CID, would we also have to encrypt the CID?

mikeal commented 5 years ago

Also, if the solution to this ends up being “we encrypt another CID for the encrypted block” then we need to rope in an encryption expert because some of the bytes are going to be rather predictable.

Gozala commented 5 years ago

Let’s explore this a little further. Is the fact that you’re encoding git data sensitive as well? In other words, if we were to encode a CID, would we also have to encrypt the CID?

There is more detailed elaboration on details but here is summary:

I want to provide a generic secure message feed library, meaning application code decides what the messages (and the corresponding format for those are). Further more feed attempts to have several layers of access:

Followers - Get keys through invite and can subscribe to feed and access all the messages
Replicators - Are also invited with a different invite keys, can subscribe to the feed to traverse arbitrary graph of encrypted blobs to keep feed available. Unlike followers replicators can't make sense if it's feed or a graph or what are the nodes in it.
Everyone else - If the come across to the feed head can't make any sense of it.

To accomplish this there are multiple layers of encryption:

Each message is encrypted for Followers.
Node for replicator is created that just links to previous head, and encrypted message, which is then encrypted for replicator.
IPNS name is updated to point to new node.

Note that at the feed implementation layer I do not want to know what the messages are or what the format is, I just want them to be Blocks. Also I do not want those messages / corresponding blocks to have CIDs as that might leak unencrypted messages.

Furthermore it implement another codec like SSB private-box so that message in the feed can be directed at specific friends (meaning arbitrary followers can't read them, or know who they are for or how many recipients that message have - image in previous comment is visualization of that). Also worth noting that private-box message should ideally also be in arbitrary format.

This all works out really nicely with idea of "inline-links" because you preserve same linked data doesn't need to be stored in separate block, but rather get's inlined into the target block - that is format+encodedbytes are added.

Gozala commented 5 years ago

Let’s explore this a little further. Is the fact that you’re encoding git data sensitive as well? In other words, if we were to encode a CID, would we also have to encrypt the CID?

Let’s explore this a little further. Is the fact that you’re encoding git data sensitive as well? In other words, if we were to encode a CID, would we also have to encrypt the CID?

Not sure if I fully understand this but assuming I do that is what the feed abstraction does (Textile does the same thing BTW) CIDs to the encrypted blocks are concealed to the topmost layer so adversary can't traverse the graph.

mikeal commented 5 years ago

that is format+encodedbytes are added

So, the format is not encrypted? Or at least, not encrypted at layer these encodedBytes are stored, but it may be inside another encrypted container.

Gozala commented 5 years ago

So, the format is not encrypted? Or at least, not encrypted at layer these encodedBytes are stored, but it may be inside another encrypted container.

It is, this is exactly what I'm doing today:

https://github.com/Gozala/ipdf/blob/499fce4b048bb6a5d39a2060bd27792dab496e74/src/feed.js#L242-L256

Having to know the format, encoding, prefixing is all incidental complexity. Ideally there would be something like encrypt(await Dag.encode({secret: block}, "dag-cbor"), nonce, secret) and on the decode side it would be Dag.decode(await decrypt(bytes, nonce, secret), "dag-cbor").secret

Gozala commented 5 years ago

It is worth mentioning that if in the above case block was CID it would have worked exactly as desired (well except secret won't be a secret, or be inaccessible) as secret would have contained info about format and decode would do the right thing. That is why I'm saying I wan't inline link so it would just inline block and act all the same as if it had CID.

Gozala commented 5 years ago

It is also worth pointing out that this would enable not only encrypting single message in a single format but say multiple messages in different formats (just like you can link to multilpe blocks encoded in different formats) transparently and without introducing further complexity. Without inline links you'd have to encrypt individual message and then pack them together from the outside, however that's not great because you'll end up either revealing number of messages or will have to encrypt yet again, not to mention that would constrain structure of your nodes. Inline links address all that in way that fit's natural (at least to me) to the existing IPLD model.

mikeal commented 5 years ago

That is why I'm saying I wan't inline link so it would just inline block and act all the same as if it had CID.

I get that, I think the thing I didn’t quite understand until today was that the format may be different. We’ve actually been working hard to remove the distinction between a link and an inline value as far as reads go. Specs like unixfs-v2 no longer specify when something is a link, or make reference to block boundaries, so an entire file system could be one Block or thousands and it would be read the same way. That’s why I was trying so hard to figure out a way to avoid nesting of “Blocks” within each other, since without a CID it’s just a special in-block value, but the requirement that the nested value be encoded into another format means we probably do need something along these lines.

mikeal commented 5 years ago

Inline links address all that in way that fit's natural (at least to me) to the existing IPLD model.

If it doesn’t have a CID, and the entire thing exists inside another block, I don’t think we should call it a “link.” I’m not even sure if “inline Block” is the right term, it makes sense to me now, but I worry about confusing new developers. We can bike shed the terminology later, I think I understand the use case enough now.

I’m going to think on this a little more and then write up a larger new issue that can hopefully cover all the places this touches.

The impetus for a lot of this seems to be leveraging the same multicodec parsing engine, which makes me really wish WebAssembly was a little farther along. If we could implement the decoder in WebAssembly then we could just reference it directly by a link rather than a multicodec reference. That would expand this out of the “inline Block” metaphor, because we wouldn’t have to leverage the same decoding engine and it would become a much more robust parameterized envelope.

Gozala commented 5 years ago

Specs like unixfs-v2 no longer specify when something is a link, or make reference to block boundaries, so an entire file system could be one Block or thousands and it would be read the same way. That’s why I was trying so hard to figure out a way to avoid nesting of “Blocks” within each other, since without a CID it’s just a special in-block value, but the requirement that the nested value be encoded into another format means we probably do need something along these lines.

I might be missing context here but it appears to me that what I'm suggesting is aligned with that, in fact I also do want to be able to remove distinction between linked blocks and nested blocks as well and have freedom to choose how blocks are arranged in memory (single blob vs many linked blobs). It's just your use case graph seems homomorphic while mine is polymorphic.

It appears to me that we share the same goal & are just stack on the metaphors we use to describe it.

mikeal commented 5 years ago

@Gozala I didn’t mean to suggest these were out of alignment, I was just iterating through my own process in understanding this use case and I was less inclined early on to extend the Block concept to it, but it all makes sense now.

ehsan6sha commented 2 years ago

Hi. What is the latest on this?

RangerMauve commented 2 years ago

@matheus23 mind if we resume the convo here?

A big question I still have with that is: What is the developer experience for working with this? Is this just building a small library that makes it easy for you to en/decrypt dag-cbor encoded byte strings? Is it a library that allows you to easily move from encrypted DAG <-> decrypted DAG? Or is it just one concrete use case that is something like rs-wnfs which is still very "concrete" in where it applies these patterns.

Personally, I've been thinking about this from the perspective of IPLD ADLs. One could use a node builder to construct an encrypted DAG, then use something like an IPLD URL with the decryption key in it, or using the new Tagged Pointers spec once it comes out.

matheus23 commented 2 years ago

@RangerMauve

One could use a node builder to construct an encrypted DAG

I'm not familiar with node builders. Maybe I can put this more generally: I haven't worked with go-ipld-prime at all so far. I guess I'm missing out on a bunch of IPLD ideas because of that. My practical IPLD experience is based on working with JS and rust libraries mostly.

use something like an IPLD URL with the decryption key in it

I don't really like that idea. It would be bad if it's a query param, since then the decryption key would be sent to gateways. This concern is invalid if you're running your own gateway locally, but - as a pattern - I think it can be harmful. If it's possible, people will send their decryption keys to gateways that wouldn't actually want it.

On the other hand, if you don't send the decryption key to the gateway, how is your data decrypted? Well, ideally in the frontend. For that, the server would need to serve some HTML with some script that automatically decrypts the block you're looking at and knows how to look for further blocks & how to piece them together. This would need good IPLD (unixfs?) libraries for browsers. Then we could technically put the key into the URL fragment, but even then I'm not a big fan of that, since URL fragments aren't meant to store confidential data. There's little protecting you from accidentally showing them.

ehsan6sha commented 2 years ago

What if we encrypt the IPLD data in each node with a different symmetric key, and keep a side tree with the encryption keys linked to each node?

Another approach might be similar to the same thing that crypto tree is using for IPFS files encryption (WNFS): https://whitepaper.fission.codes/file-system/partitions/private-directories/concepts/cryptree

RangerMauve commented 2 years ago

@matheus23

On the other hand, if you don't send the decryption key to the gateway, how is your data decrypted? Well, ideally in the frontend. For that, the server would need to serve some HTML with some script that automatically decrypts the block you're looking at and knows how to look for further blocks & how to piece them together.

That's pretty much what https://peergos.org/ have been doing with their capabilities. 😁 Their protocol diverges from regular IPFS/IPLD a bit however.

ipld / ipld

Encryption layer for IPLD #64