Closed aschmahmann closed 4 years ago
Hey @aschmahmann, thank you very much for the detailed feedback! I'm in a holding pattern while we finish a POC implementation. There are already a number of changes that I'll try to summarize while addressing your feedback next week.
Hey again @aschmahmann — thanks again for the review! We've been making some solid progress over at https://github.com/textileio/go-textile-threads/. There are some examples in the exe
folder. Much has changed from the napkin sketching in the paper draft.
Adding some inline comments below...
IPNS-over-PubSub and Persistent PubSub summary
The main change that enables this to work is that when a node first connects to another node in a persistent pubsub channel it asks/pulls from them the latest state of the channel
This is great. We will definitely play with it. By way of finding some concrete points of integration, you can consider Threads to be operating under the assumption that:
- We could just know which peers to connect to because we have some system like a peer log, or record of previous users
Currently, "joining" a thread is done by simply navigating to a host's adresss, e.g., /ip4/127.0.0.1/tcp/4006/p2p/12D3KooWGiFcF3Ey1knau6UszUgwdQ3K1BUJQmiUUcs4z4shMWj3/thread/bafkufevg3bqezf3mx7vmloyezupzkdx5ozsfudwky3o3c6kiqvqsldy
, which will serve all the logs. Now you know the writers and can simply poll each log address for updates. If you start writing, you will receive direct pushes. However, it would be very nice to layer on an additional DHT-based discovery mechanism. I think this would be achievable by using an ipns
-based multiaddress for logs in addition to p2p
-based addresses. Then each peer is able to "pull" (and receive pushes) from an IPNS address.
Edit: ticket added to track -> https://github.com/textileio/go-textile-threads/issues/105
Multi-writer IPNS
Unfortunately, other more pressing needs have taken my time away from this. However, a large chunk of the work is already handled by persistent pubsub and could be utilized by Threads.
The TLDR on how I'm thinking this would work is that the data being persisted via PubSub is a DAG where the heads of the DAG are all the places where users' state diverges (e.g. two users made changes unaware of each other). If a user ever received a head where it didn't know about the previous link it would directly query the peer that sent them the head for rest of the graph behind that link (this mechanism would be swappable for GraphSync). Once you have the graph you can do with it whatever you want (e.g. if it has some automatic merge function like a CRDT or OT then merge it, otherwise show the user the conflicts/options).
I know the summary above is fairly terse, but I'm happy to go into more detail here if there's interest.
Makes sense! The mechanism sounds similar to something that was landed on during implementation, namely, "pull records" from another peer starting from some offset and going forward (like a pull-down menu, it's gives you latest).
DAT Similarities
As you mentioned in #35 I'd look at DAT and write up some of the comparisons, since there are definitely some similarities. In particular, hypermerge and the multiwriter DEP should be of use
* My understanding is that Hypercore is like IPNS, but more restrictive * It uses a single writer Merkle Clock instead of a version number * Can be emulated by IPNS if, by convention, every IPNS entry points at an IPFS root where the `root = (latest entry, pointer to previous root)` * My understanding of Hypermerge is that it utilizes many hypercores where results from all hypercores are merged to together with some reducing function (generally op based CRDT, but could be OT or potentially even a manual merge) * The similarities here stem from both Hypermerge and Threads utilizing the approach of creating a multi-writer system by utilizing multiple single-writer systems and some "email" or out of band messaging (see [ipfs/notes#379](https://github.com/ipfs/notes/issues/379) for some thoughts on the different categories multiwriter systems - sorry the ideas aren't as cleaned up as I'd like)
Great, this is super helpful!
ACLs
It'd be great to have a more fleshed out consistency model for the ACL + data. Some examples:
1. Strong Consistency/Consensus: * The participants in the network have an agreement as to the total order of all events 2. Causal Consistency * If A makes a change a1, and later makes a change a2, B cannot process a2 without first processing a1 3. Eventual Consistency * If A, B and C make changes, if they are allowed to communicate then they will eventually agree on the same state (could be event ordering, or just the state created by processing the events)
+1. We've moved to a more pluggable approach that would put these requirements on what we've been calling the EventCodec
: https://github.com/textileio/go-textile-core/blob/master/store/store.go#L69. Currently, there's only one of these implemented, a "json-patcher": https://github.com/textileio/go-textile-threads/blob/master/jsonpatcher/jsonpatcher.go#L45, which behaves something like Casual Consistency but is really more of POC. You can imagine a CRDT type of codec that provides the peace of mind of Eventual Consistency.
Here's an issue to track: https://github.com/textileio/papers/issues/48
If I understood correctly it seems like one of the big ideas of this Threads proposal is that we can create eventually consistent ACLs for use in non-malicious environments. Given the possible attacks on this scheme I think defining the threat model (i.e. what a non-malicious environment is) is important.
Yep, but I think we've so far hand-waved over the details that are not clear in favor of focusing on the model and event store mechanics. cc/ @carsonfarmer who has been doing some research into the types of ACLs we will actually need to develop.
Delayed follow up here @aschmahmann , but where might I learn more about the persistent pubsub work? How usable is the pubsub router stuff for things other than IPNS over pubsub? I'm reading through this and related PRs etc. https://github.com/ipfs/specs/pull/218/files If easier, this can also wait until our call next week.
Closing this in favour of more specific sub-issues and some previous work on resolving this, now in master.
@sanderpick the paper is looking good! Thanks for asking me to review 😃
There's a lot of information below, which I hope is helpful to you. I'm also happy to go into more detail either on Github or on a call.
IPNS-over-PubSub and Persistent PubSub summary
IPNS over PubSub now exists as an IPNS transport that can be used totally independently of a DHT https://github.com/ipfs/specs/pull/218, ipfs/go-ipfs#6447
The main change that enables this to work is that when a node first connects to another node in a persistent pubsub channel it asks/pulls from them the latest state of the channel
hash("pubsub:IPNS Key")
, this will be much more performant than a standard IPNS DHT lookup. In particular, it will have the performance of an IPFS DHT lookup instead of the performance of an IPNS DHT lookup)Multi-writer IPNS
Unfortunately, other more pressing needs have taken my time away from this. However, a large chunk of the work is already handled by persistent pubsub and could be utilized by Threads.
The TLDR on how I'm thinking this would work is that the data being persisted via PubSub is a DAG where the heads of the DAG are all the places where users' state diverges (e.g. two users made changes unaware of each other). If a user ever received a head where it didn't know about the previous link it would directly query the peer that sent them the head for rest of the graph behind that link (this mechanism would be swappable for GraphSync). Once you have the graph you can do with it whatever you want (e.g. if it has some automatic merge function like a CRDT or OT then merge it, otherwise show the user the conflicts/options).
I know the summary above is fairly terse, but I'm happy to go into more detail here if there's interest.
DAT Similarities
As you mentioned in #35 I'd look at DAT and write up some of the comparisons, since there are definitely some similarities. In particular, hypermerge and the multiwriter DEP should be of use
root = (latest entry, pointer to previous root)
ACLs
It'd be great to have a more fleshed out consistency model for the ACL + data. Some examples: 1) Strong Consistency/Consensus:
If I understood correctly it seems like one of the big ideas of this Threads proposal is that we can create eventually consistent ACLs for use in non-malicious environments. Given the possible attacks on this scheme I think defining the threat model (i.e. what a non-malicious environment is) is important.