Closed sanderpick closed 5 years ago
Oh wow, I'm sorry it was not my intention to trigger a major rewrite. But since we're discussing this subject I would offer few thoughts & pointers that I think might be relevant here.
ipfs.dag
which will have following properties:
export type FeedBlock = {
author: Buffer, // Public key of the author
size: number, // sequence number, length of the feed
content: CID, // CID of the message
previous: null | CID, // CID of the previous block unless it's first one
signature: Buffer // this block signed by the private key of the author
}
content
field is IPLD link to the message which can be in whatever format. IPLD supports proto-buf format which might be relevant here. I also obviously could be encrypted with shared thread key.previous
field is also IPLD link that point to previous block.Also /cc @pvh he's far more qualified on hypermerge side of things
I should point out I don't have a great solution for feed identifier yet. My hope is (pub sub base) IPNS will be good enough. But if itsn't then my backup plan is to generate keypair and store public key mapped to recent head on DHT & use PubSub to announce updates.
No need to be sorry! :) I've been meaning to jot down the next iteration. You just gave me the kick. This wouldn't amount to a full rewrite... smaller than it sounds actually.
What you've describes is pretty close to threads. You can see the block message here:
https://github.com/textileio/go-textile/blob/master/pb/protos/model.proto#L89
Thread structure looks like this:
https://github.com/textileio/go-textile/blob/master/pb/protos/model.proto#L35
Thanks for the links. I will take a look for sure!
Also worth noting, when blocks are sent over the wire, they get wrapped in an Envelope
:
https://github.com/textileio/go-textile/blob/master/pb/protos/message.proto#L51
Which is signed by the peer and encrypted with the thread key.
Since the blocks themselves are protos and not IPLD objects, the links between them are also encrypted. However, off-block data, like a file or directory, is part of an IPLD object.
What you've describes is pretty close to threads. You can see the block message here:
https://github.com/textileio/go-textile/blob/master/pb/protos/model.proto#L89
Nice, I did not realize it was that similar. Few remarks / questions:
target
field in the Block ?Also worth noting, when blocks are sent over the wire, they get wrapped in a an
Envelope
:https://github.com/textileio/go-textile/blob/master/pb/protos/message.proto#L51
Which is signed by the peer and encrypted with the thread key.
Oh that is interesting. In my mind content stored in the feed was encrypted already. I wonder what lead you to choose doing this instead. Is that so to avoid decryption on local use or ?
@Gozala:
Two answers (sander might have better answers here as well)
target
is the equivalent of your content
Also, not an instead situation: content stored in feed is already encrypted... but we also want to encrypt the block updates so that links between updates etc aren't visible in plain text. So there is essentially multiple layers of encryption.
Yep, file / content encryption is dictated by the thread schema, which essentially describes what the IPLD file structure looks like. By default, encryption is on.
Schema Node
:
https://github.com/textileio/go-textile/blob/master/pb/protos/model.proto#L159
For example, this is a directory that contains files generated by one of the "build-in" schemas (https://github.com/textileio/go-textile/blob/master/schema/textile/media.go)
https://ipfs.io/ipfs/QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc
The block that points to this directory uses it's top-level hash as target
.
Also, not an instead situation: content stored in feed is already encrypted... but we also want to encrypt the block updates so that links between updates etc aren't visible in plain text. So there is essentially multiple layers of encryption.
Thanks for elaboration. I understand intent now, which is to conceal pointers to the feed itself that is it's size and encrypted content that it points to. That way you could have two layers of participation:
This leads me to thinking that it would be really nice to have IPLD composition (as in function composition) so you could pipeline IPLD encryption resolver with IPLD proto-buf resolver. Unless I'm overlooking something that would simplify things a bit.
Another relevant thought that IPLD should probably have a native support for encryption built-in so that GraphSync queries could cut through encryption layer.
One more piece of context here. Below is an example File
which is produced by a thread's feed for consumption by UIs. This thread is using the built-in media schema. Internally, the block (a FILES
block in this case) is expanded with the FileIndex
objects which mirror the actual data's IPLD
structure. Having these file indexes are used for local de-duplication post-encryption.
{
"block": "QmYiSkeiMPajEMfte1zbw9mY3as42SDqZXcfXYGfcg4dhN",
"target": "QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc",
"date": "2019-02-27T23:01:16.785922Z",
"user": {
"address": "P4adwmG7G7FhoELxCUATKPqY6Fp2w9AdeFoLKUSQfDs5JCpp",
"name": "sandercli",
"avatar": "QmQKDDGXcionSQoPsCg81SQAfBuSwX7Eo1qLUr5e9nHHcE"
},
"files": [
{
"links": {
"large": {
"mill": "/image/resize",
"checksum": "3tw2cHmeaStZhU21AG4w2N5iJfyc3gMJdftsfmNG6wqQ",
"source": "H7KLtLVVcbA7aXeM9JxN4caizjm3tbk8L19bgj44zc8x",
"opts": "21uBAuSeQUdw5aDu5CYPxEfeiLVeuvku1T26nWtJC84C",
"hash": "QmSaVjzWANimkcF7UtazTRx4jsCG9cgrVYWGsTzx2qhbcw",
"key": "YYDkNSZBBkzF8Ck8XhHJYRnvqJGgsXMtkE355Vsj7uBEABdiSaMd7LtEwyCk",
"media": "image/jpeg",
"name": "1.jpg",
"size": "23612",
"added": "2019-02-25T06:27:38.504477Z",
"meta": {
"height": 350,
"width": 525
},
"targets": [
"QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc"
]
},
"small": {
"mill": "/image/resize",
"checksum": "FVUHFkLXRrDyKN3uQfPab89xzJfwNUuFhYq62JSgi6rM",
"source": "H7KLtLVVcbA7aXeM9JxN4caizjm3tbk8L19bgj44zc8x",
"opts": "CassDcqf192MnceweJKGJvhZfrV9kB3GJRbvNPWm6raa",
"hash": "QmbaN21SSUdhk3PUhZKZ55sS6nETSieyBRuPqwrfKDAwgp",
"key": "2GBx7GiMsdgAikDfuGmsfUhxYAMCo2SkxD67tqrm7MGDJbd1uKx4eiM5E1FZq",
"media": "image/jpeg",
"name": "1.jpg",
"size": "12330",
"added": "2019-02-25T06:27:38.449826Z",
"meta": {
"height": 213,
"width": 320
},
"targets": [
"QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc"
]
},
"thumb": {
"mill": "/image/resize",
"checksum": "9yRHYLoFJgdFQaMf6Vk695oYB1RBbECTi9Pi7WNXba8S",
"source": "3tw2cHmeaStZhU21AG4w2N5iJfyc3gMJdftsfmNG6wqQ",
"opts": "7aGVJ7nGgmHdqv8oiEKZ2ZbrNBv1zVP3ADSuT2sW3MwT",
"hash": "QmYwS6WBuJoNNp7rdXMC3SbViTEorAs3bnraaUNJapUSXc",
"key": "k5whsL5dDJDtvpA3ZbzrMAxaDmMzhso95wzi8UDSybk9C2ueeAP3w6ixyvax",
"media": "image/jpeg",
"name": "1.jpg",
"size": "2776",
"added": "2019-02-25T06:27:38.543848Z",
"meta": {
"height": 67,
"width": 100
},
"targets": [
"QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc"
]
}
}
}
],
"comments": [
],
"likes": [
],
"threads": [
"12D3KooWSaF74rhqDwNZpBKK8RqnRMa2wj7AwyAzUhVo2YYF28rK"
]
}
I sketched out more or less what I'm aiming for & would love feedback
Where can I see what you've sketched out, @Gozala?
Where can I see what you've sketched out, @Gozala?
Oh, I forgot my to paste a link 🤦♂️ https://github.com/gozala/ipdf/ in terms of other files only worth looking right now is this one https://github.com/Gozala/ipdf/blob/master/src/format.js
It dawned on my over the weekend that I've gone down the single writer piece of the hybrid single/multi-peer trees proposed solution above in the past and ran into a wall: Of course you can't simply compose one tree out of the other because they're immutable chains (the hashed data of a block includes it's parent(s)). You could compose an additional chain with blocks that contain the other blocks, but then you'd have two add them as new blocks to IPFS.
We could try a normal IPLD chain for the composed chain, which would not require new blocks. The links would be exposed, but the chain would "leak" much less information since nodes simply point to another encrypted proto message blob.
IPLD node for a single peer's record of all other immutable thread member trees:
block: link<hash of encrypted proto message>
parent: link<hash of parent node, which contains a link to another encrypted proto message>
Yes to using IPLD if/where possible. This would likely also increase interoperability with other tools significantly. I think the exposed links are probably ok (best someone could determine is the length and frequency of chain updates), but could be perhaps still encrypt the top level IPLD object as we did previously? Or even encrypt the links before adding them, creating something like an Encrypted-IPLD object?
Obviously this would require a special IPLD resolver... but maybe this would be something others could find useful without too much additional work?
@Gozala very nice! I took a long weekend and have a bunch of catch up today. Will jump in this evening or tomorrow.
@sanderpick Chances are I'm misunderstanding what you're saying above, so please bear with me. @mikeal suggested something relevant here which I incorporated into my draft:
https://github.com/Gozala/ipdf/blob/master/src/format.js#L25-L34
export type Head<a> = {
author: AuthorPublicKey,
signature: Signature,
block: Encoded<Block<a>, ReplicatorKey>
}
export type Block<a> = {
links: CID<Head<a>>[],
message: Encoded<Message<a>, FollowerKey>
}
Idea here is that ReplicatorKey
key can be used to unwrap one layer that does not reveal anything other that what links are there, which could be arbitrary blocks encrypted or not. That way you don't necessarily reveal a chain either.
Obviously this would require a special IPLD resolver... but maybe this would be something others could find useful without too much additional work?
As per following threads: https://github.com/ipld/ipld/issues/64 https://github.com/ipld/ipld/issues/63
It seems that there is interest in making it possible to evolve IPLD to allow recursive message unpacking of some sorts. My plan for implementation now is to implement kind of polyfill for that or maybe an utility that can take "decoder" and IPLD resolver and return new IPLD resolver that will use decoder to decode block before continue with a resolution.
I also find idea of encoding decoder config into block itself really compelling.
After talking w/ @gozala about DAT and having a few other concerns pile up in my head, I want to start an epic around the next version of threads.
The current thread implementation has a couple major downsides. One related to the append-only log, and the other to over-the-wire orchestration:
MERGE
blocks. Conflicts arise with almost every write since members all write to the same tree. As a rule of thumb, the lessMERGE
blocks the better, or in other words, the tree should be as narrow as possible.Proposed solutions:
Thread members already keep track of who's in a thread via the
thread_peers
table. Similar to DAT, we can track HEAD for each peer by simply adding a column to this table. This would drastically reduce the number ofMERGE
blocks in each chain because only one peer would be writing to each.Now, the problem remains, how does each member efficiently snapshot the thread for backup? Each node can write their own additional chain that essentially joins all the others (including their own single). This joined tree doens't need to be shared/synced with other peers (they write their own). So, merges are just not needed. The HEAD of this tree is what is written to backup snapshots. During a recovery from the snapshot, a peer can walk it backwards, building a single tree for each peer.
So far, we've had good luck with gossipsub in search. Non-server based nodes are able to participate in search queries. We can replace direct P2P comms between normal peers with gossipsub + an ACK mechanism to hopefully increase peer discoverability. So, similar to how search currently works where the Cafe Service subs each peer to its own peerID, the Thread Service can sub each peer to its own account address (related to https://github.com/textileio/go-textile/issues/565).
Thoughts?
cc @gozala @jimpick