textileio / go-textile

[DEPRECATED] Textile is a set of tools and infrastructure for building composable apps and services on the IPFS network
https://textile.io
MIT License
357 stars 43 forks source link

Threads v2 (discussion) #566

Closed sanderpick closed 5 years ago

sanderpick commented 5 years ago

After talking w/ @gozala about DAT and having a few other concerns pile up in my head, I want to start an epic around the next version of threads.

The current thread implementation has a couple major downsides. One related to the append-only log, and the other to over-the-wire orchestration:

  1. The log (or tree) ends up with many MERGE blocks. Conflicts arise with almost every write since members all write to the same tree. As a rule of thumb, the less MERGE blocks the better, or in other words, the tree should be as narrow as possible.
  2. Block messages (envelopes containing an update) are rarely sent P2P. Usually, the messages end up being sent to the recipients' cafe inbox(s). This is a result of peers currently only being able to communicate via a direct connection.

Proposed solutions:

  1. Hybrid single/multi-peer trees

Thread members already keep track of who's in a thread via the thread_peers table. Similar to DAT, we can track HEAD for each peer by simply adding a column to this table. This would drastically reduce the number of MERGE blocks in each chain because only one peer would be writing to each.

Now, the problem remains, how does each member efficiently snapshot the thread for backup? Each node can write their own additional chain that essentially joins all the others (including their own single). This joined tree doens't need to be shared/synced with other peers (they write their own). So, merges are just not needed. The HEAD of this tree is what is written to backup snapshots. During a recovery from the snapshot, a peer can walk it backwards, building a single tree for each peer.

  1. Use gossipsub for p2p comms

So far, we've had good luck with gossipsub in search. Non-server based nodes are able to participate in search queries. We can replace direct P2P comms between normal peers with gossipsub + an ACK mechanism to hopefully increase peer discoverability. So, similar to how search currently works where the Cafe Service subs each peer to its own peerID, the Thread Service can sub each peer to its own account address (related to https://github.com/textileio/go-textile/issues/565).

Thoughts?

cc @gozala @jimpick

Gozala commented 5 years ago

Oh wow, I'm sorry it was not my intention to trigger a major rewrite. But since we're discussing this subject I would offer few thoughts & pointers that I think might be relevant here.

Gozala commented 5 years ago

Also /cc @pvh he's far more qualified on hypermerge side of things

Gozala commented 5 years ago

I should point out I don't have a great solution for feed identifier yet. My hope is (pub sub base) IPNS will be good enough. But if itsn't then my backup plan is to generate keypair and store public key mapped to recent head on DHT & use PubSub to announce updates.

sanderpick commented 5 years ago

No need to be sorry! :) I've been meaning to jot down the next iteration. You just gave me the kick. This wouldn't amount to a full rewrite... smaller than it sounds actually.

What you've describes is pretty close to threads. You can see the block message here:

https://github.com/textileio/go-textile/blob/master/pb/protos/model.proto#L89

Thread structure looks like this:

https://github.com/textileio/go-textile/blob/master/pb/protos/model.proto#L35

Thanks for the links. I will take a look for sure!

sanderpick commented 5 years ago

Also worth noting, when blocks are sent over the wire, they get wrapped in an Envelope:

https://github.com/textileio/go-textile/blob/master/pb/protos/message.proto#L51

Which is signed by the peer and encrypted with the thread key.

sanderpick commented 5 years ago

Since the blocks themselves are protos and not IPLD objects, the links between them are also encrypted. However, off-block data, like a file or directory, is part of an IPLD object.

Gozala commented 5 years ago

What you've describes is pretty close to threads. You can see the block message here:

https://github.com/textileio/go-textile/blob/master/pb/protos/model.proto#L89

Nice, I did not realize it was that similar. Few remarks / questions:

Gozala commented 5 years ago

Also worth noting, when blocks are sent over the wire, they get wrapped in a an Envelope:

https://github.com/textileio/go-textile/blob/master/pb/protos/message.proto#L51

Which is signed by the peer and encrypted with the thread key.

Oh that is interesting. In my mind content stored in the feed was encrypted already. I wonder what lead you to choose doing this instead. Is that so to avoid decryption on local use or ?

carsonfarmer commented 5 years ago

@Gozala:

Two answers (sander might have better answers here as well)

Also, not an instead situation: content stored in feed is already encrypted... but we also want to encrypt the block updates so that links between updates etc aren't visible in plain text. So there is essentially multiple layers of encryption.

sanderpick commented 5 years ago

Yep, file / content encryption is dictated by the thread schema, which essentially describes what the IPLD file structure looks like. By default, encryption is on.

Schema Node:

https://github.com/textileio/go-textile/blob/master/pb/protos/model.proto#L159

For example, this is a directory that contains files generated by one of the "build-in" schemas (https://github.com/textileio/go-textile/blob/master/schema/textile/media.go)

https://ipfs.io/ipfs/QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc

The block that points to this directory uses it's top-level hash as target.

Gozala commented 5 years ago

Also, not an instead situation: content stored in feed is already encrypted... but we also want to encrypt the block updates so that links between updates etc aren't visible in plain text. So there is essentially multiple layers of encryption.

Thanks for elaboration. I understand intent now, which is to conceal pointers to the feed itself that is it's size and encrypted content that it points to. That way you could have two layers of participation:

  1. Invited replicators (cafes in textile) that can make content available without access to it and without user having to revealing to others anything about the feed.
  2. Invited participants (friends) that can both follow the feed and access contents of it.

This leads me to thinking that it would be really nice to have IPLD composition (as in function composition) so you could pipeline IPLD encryption resolver with IPLD proto-buf resolver. Unless I'm overlooking something that would simplify things a bit.

Gozala commented 5 years ago

Another relevant thought that IPLD should probably have a native support for encryption built-in so that GraphSync queries could cut through encryption layer.

sanderpick commented 5 years ago

One more piece of context here. Below is an example File which is produced by a thread's feed for consumption by UIs. This thread is using the built-in media schema. Internally, the block (a FILES block in this case) is expanded with the FileIndex objects which mirror the actual data's IPLD structure. Having these file indexes are used for local de-duplication post-encryption.

{
    "block": "QmYiSkeiMPajEMfte1zbw9mY3as42SDqZXcfXYGfcg4dhN",
    "target": "QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc",
    "date": "2019-02-27T23:01:16.785922Z",
    "user": {
        "address": "P4adwmG7G7FhoELxCUATKPqY6Fp2w9AdeFoLKUSQfDs5JCpp",
        "name": "sandercli",
        "avatar": "QmQKDDGXcionSQoPsCg81SQAfBuSwX7Eo1qLUr5e9nHHcE"
    },
    "files": [
        {
            "links": {
                "large": {
                    "mill": "/image/resize",
                    "checksum": "3tw2cHmeaStZhU21AG4w2N5iJfyc3gMJdftsfmNG6wqQ",
                    "source": "H7KLtLVVcbA7aXeM9JxN4caizjm3tbk8L19bgj44zc8x",
                    "opts": "21uBAuSeQUdw5aDu5CYPxEfeiLVeuvku1T26nWtJC84C",
                    "hash": "QmSaVjzWANimkcF7UtazTRx4jsCG9cgrVYWGsTzx2qhbcw",
                    "key": "YYDkNSZBBkzF8Ck8XhHJYRnvqJGgsXMtkE355Vsj7uBEABdiSaMd7LtEwyCk",
                    "media": "image/jpeg",
                    "name": "1.jpg",
                    "size": "23612",
                    "added": "2019-02-25T06:27:38.504477Z",
                    "meta": {
                            "height": 350,
                            "width": 525
                        },
                    "targets": [
                        "QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc"
                    ]
                },
                "small": {
                    "mill": "/image/resize",
                    "checksum": "FVUHFkLXRrDyKN3uQfPab89xzJfwNUuFhYq62JSgi6rM",
                    "source": "H7KLtLVVcbA7aXeM9JxN4caizjm3tbk8L19bgj44zc8x",
                    "opts": "CassDcqf192MnceweJKGJvhZfrV9kB3GJRbvNPWm6raa",
                    "hash": "QmbaN21SSUdhk3PUhZKZ55sS6nETSieyBRuPqwrfKDAwgp",
                    "key": "2GBx7GiMsdgAikDfuGmsfUhxYAMCo2SkxD67tqrm7MGDJbd1uKx4eiM5E1FZq",
                    "media": "image/jpeg",
                    "name": "1.jpg",
                    "size": "12330",
                    "added": "2019-02-25T06:27:38.449826Z",
                    "meta": {
                            "height": 213,
                            "width": 320
                        },
                    "targets": [
                        "QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc"
                    ]
                },
                "thumb": {
                    "mill": "/image/resize",
                    "checksum": "9yRHYLoFJgdFQaMf6Vk695oYB1RBbECTi9Pi7WNXba8S",
                    "source": "3tw2cHmeaStZhU21AG4w2N5iJfyc3gMJdftsfmNG6wqQ",
                    "opts": "7aGVJ7nGgmHdqv8oiEKZ2ZbrNBv1zVP3ADSuT2sW3MwT",
                    "hash": "QmYwS6WBuJoNNp7rdXMC3SbViTEorAs3bnraaUNJapUSXc",
                    "key": "k5whsL5dDJDtvpA3ZbzrMAxaDmMzhso95wzi8UDSybk9C2ueeAP3w6ixyvax",
                    "media": "image/jpeg",
                    "name": "1.jpg",
                    "size": "2776",
                    "added": "2019-02-25T06:27:38.543848Z",
                    "meta": {
                            "height": 67,
                            "width": 100
                        },
                    "targets": [
                        "QmcbVc6mu9eDnukB2HC2WJ7Ug4zLuADydo2yYtxfFRKzLc"
                    ]
                }
            }
        }
    ],
    "comments": [
    ],
    "likes": [
    ],
    "threads": [
        "12D3KooWSaF74rhqDwNZpBKK8RqnRMa2wj7AwyAzUhVo2YYF28rK"
    ]
}
Gozala commented 5 years ago

I sketched out more or less what I'm aiming for & would love feedback

sanderpick commented 5 years ago

Where can I see what you've sketched out, @Gozala?

Gozala commented 5 years ago

Where can I see what you've sketched out, @Gozala?

Oh, I forgot my to paste a link 🤦‍♂️ https://github.com/gozala/ipdf/ in terms of other files only worth looking right now is this one https://github.com/Gozala/ipdf/blob/master/src/format.js

sanderpick commented 5 years ago

It dawned on my over the weekend that I've gone down the single writer piece of the hybrid single/multi-peer trees proposed solution above in the past and ran into a wall: Of course you can't simply compose one tree out of the other because they're immutable chains (the hashed data of a block includes it's parent(s)). You could compose an additional chain with blocks that contain the other blocks, but then you'd have two add them as new blocks to IPFS.

We could try a normal IPLD chain for the composed chain, which would not require new blocks. The links would be exposed, but the chain would "leak" much less information since nodes simply point to another encrypted proto message blob.

IPLD node for a single peer's record of all other immutable thread member trees:

block: link<hash of encrypted proto message>
parent: link<hash of parent node, which contains a link to another encrypted proto message>
carsonfarmer commented 5 years ago

Yes to using IPLD if/where possible. This would likely also increase interoperability with other tools significantly. I think the exposed links are probably ok (best someone could determine is the length and frequency of chain updates), but could be perhaps still encrypt the top level IPLD object as we did previously? Or even encrypt the links before adding them, creating something like an Encrypted-IPLD object?

carsonfarmer commented 5 years ago

Obviously this would require a special IPLD resolver... but maybe this would be something others could find useful without too much additional work?

sanderpick commented 5 years ago

@Gozala very nice! I took a long weekend and have a bunch of catch up today. Will jump in this evening or tomorrow.

Gozala commented 5 years ago

@sanderpick Chances are I'm misunderstanding what you're saying above, so please bear with me. @mikeal suggested something relevant here which I incorporated into my draft:

https://github.com/Gozala/ipdf/blob/master/src/format.js#L25-L34

export type Head<a> = {
  author: AuthorPublicKey,
  signature: Signature,
  block: Encoded<Block<a>, ReplicatorKey>
}

export type Block<a> = {
  links: CID<Head<a>>[],
  message: Encoded<Message<a>, FollowerKey>
}

Idea here is that ReplicatorKey key can be used to unwrap one layer that does not reveal anything other that what links are there, which could be arbitrary blocks encrypted or not. That way you don't necessarily reveal a chain either.

Gozala commented 5 years ago

Obviously this would require a special IPLD resolver... but maybe this would be something others could find useful without too much additional work?

As per following threads: https://github.com/ipld/ipld/issues/64 https://github.com/ipld/ipld/issues/63

It seems that there is interest in making it possible to evolve IPLD to allow recursive message unpacking of some sorts. My plan for implementation now is to implement kind of polyfill for that or maybe an utility that can take "decoder" and IPLD resolver and return new IPLD resolver that will use decoder to decode block before continue with a resolution.

I also find idea of encoding decoder config into block itself really compelling.