ssbc / ssb2-discussion-forum

not quite tiny, also not quite large
17 stars 1 forks source link

PPPPP: Tangles identified by the hash of the root #26

Closed staltz closed 11 months ago

staltz commented 11 months ago

One PPPPP design choice that has confused @mixmix in particular (but I bet others would raise the same concern sooner or later) is how the tangles a msg belongs to are referred by the tangle root ID. This is distinct from SSB tangles where they are human-readable names, e.g. SIP009 or group exclusion tangles for a more colorful example.

Msg format recap

Msgs in PPPPP are so far like this:

{
  "data": {
    "text": "Ola mundo!"
  },
  "metadata": {
    "dataHash": "XuZEzH1Dhy1yuRMcviBBcN",
    "dataSize": 21,
    "account": "J2SUr6XtJuFuTusNbagEW5",
    "accountTips": [
      "J2SUr6XtJuFuTusNbagEW5"
    ],
    "tangles": {
      "VsBFptgidvAspk4xTKZx6c": {   // 🔥 Note how this ID is the rootMsgId, not human friendly
        "depth": 2,
        "prev": [
          "R5G9WtDAQrco4FABRdvrUH"
        ]
      }
    },
    "domain": "post",
    "v": 3
  },
  "pubkey": "4mjQ5aJu378cEu6TksRG3uXAiKFiwGjYQtWAjfVjDAJW",
  "sig": "31StEDDnoDoDtRi49L94XPTGXxNtDJa9QXSJTd4o3wBtFAJvfQA1RsHvunU4CxdY9iC69WnxnkaW6QryrztJZkiA"
}

Question

From this design choice, an import question comes up:

Think of a multi-author discussion thread as an evolving "document". If two different "documents" start from the same root msg, do these two documents "belong" to the same tangle? How can we split them into two "named" tangles? Do we necessarily always want to replicate those two documents? What if they are "unrelated" and I only want to replicate one of them?

Mix has also once asked this same question on SSB, with different words:

Note you could make a recipe which has: • a contentless root • multiple dags which don't share backlinks (other than root) You could then consider these different documents if the operations start after the root. What's the id of each document, how do you know they weren't meant to be merged. Dunno... doesn't seem useful.

And yet another time today:

if you have a postRootMsg, how do you know which tangle is he "thread" if there are multiple tangles. Like say the root also has a moderation tangle rooted on that post? (contrived, but I'm looking for cases where there are 3 tangles present - identity in it's priviledged position and two unknown ones in tangles data)

To visually illustrate the above, see the diagram below (red are lipmaa links):

graph RL
K-->A
E-->A
I-->E
M-->L-->K-->J-->B-->A
I-->H-->G-->F-->E-->D-->C-->A
%% linkStyle 6,7,8,9,10 stroke:#0002,stroke-width:2;
linkStyle 0,1,2 stroke:#f00,stroke-width:1;

classDef default fill:#6df,stroke:#fff0,color:#000
classDef defaultw fill:#6df2,stroke:#fff0,color:#0003
classDef rib fill:#000,stroke:#fff0,color:#fff
classDef ribw fill:#0002,stroke:#fff0,color:#fff

The document ABJKLM is disjoint from ACDEFGHI (except for A).

Answer: it depends on what is the nature of these two disjoint documents. If this whole tangle is a discussion thread, and all these msgs are domain="post", then yes it would make sense to replicate everything here, and topologically sort both documents and render them blended together in one place. This would be a prime example of a Reddit-like discussion, for example.

:fire: However, if the two documents are vastly different in nature, then it may make sense to replicate only the one you want. A real world example would be: ABJKLM are msgs with domain="post" while ACDEFGHI are msgs with domain="factcheck", where fact-checking is a meta-discussion that may be suitable to replicate separately from the actual posts.

In that case, the question is very relevant and the current design of PPPPP does not solve for this. (Keep on reading because I have a proposal)

Tangle sync recap

To give a recap on how replication works as a black box (that is, just considering its contractual inputs and outputs, but not going into the details of how it is accomplished):

A "range" is an array [minDepth, maxDepth].

The algorithm itself does a lot more magic with bloom filters to efficiently exchange msgs between two peers, but at the heart of it there is a method that looks like this:

  *yieldMsgsIn(rootMsgId, range) {
    const [minDepth, maxDepth] = range
    const rootMsg = this.#peer.db.get(rootMsgId)
    if (!rootMsg) return
    if (minDepth === 0) yield rootMsg
    for (const msg of this.#peer.db.msgs()) {
      const tangles = msg.metadata.tangles
      if (
        tangles[rootMsgId] &&
        tangles[rootMsgId].depth >= minDepth &&
        tangles[rootMsgId].depth <= maxDepth
      ) {
        yield msg
      }
    }
  }

As a comparison with SSB, there is only one dimension for the input, which is the feedId. In PPPPP tangle sync, we added another dimension, range, to allow sliced (partial) replication. Thanks to lipmaa links, we can still validate whatever msgs we get, all the way to the root.

Proposal

I think we could solve the case that the question raised by simply including one more dimension: the domain. So that the tangle sync contract would be:

This would be a simple matter of updating that JS code to be:

- *yieldMsgsIn(rootMsgId, range) {
+ *yieldMsgsIn(rootMsgId, range, domain) {
    const [minDepth, maxDepth] = range
    const rootMsg = this.#peer.db.get(rootMsgId)
    if (!rootMsg) return
    if (minDepth === 0) yield rootMsg
    for (const msg of this.#peer.db.msgs()) {
      const tangles = msg.metadata.tangles
      if (
+       msg.metadata.domain === domain &&
        tangles[rootMsgId] &&
        tangles[rootMsgId].depth >= minDepth &&
        tangles[rootMsgId].depth <= maxDepth
      ) {
        yield msg
      }
    }
  }

Note how the post+factcheck example would be replicated: now I can independently ask for rootMsgId×range=[0,Infinity]×domain="post" to get all the posts in that thread without getting the factcheck messages.

Optionally, you could pass domain=null to signal that you want messages with any domain.

mixmix commented 11 months ago

Yes this could work. Notice that "domain" is kinda becoming the named tangle in this proposed solution. In ssb "type" was always a quick indexable point in a message by which you could do a sort of bloom filter first pass to get those messages (and then do tighter validations, ideally by schema)

Remember p2panda has feeds defined by schema? Like the domain is the hash of the msg schema (or something like that), then you add that as a validation step in replication. Could be nice to only validate the messages once (on write) instead of on every read. Is that relevant here? Not sure. I think I'm feeling a tweak like "uhhh, what is the domain for again... and how do you have epochs with same domain?

Sympathetic replication comes in here too. I'm thinking about the private group challenge ... does it intersect? I need a checklist of behaviours I think. Values and then expected behaviours which support those, then spec which support those


The case I was trying to raise the other day was not "ho do you know if it's one document or two" (though that's a great question).

It was - in a scenario where we have a message which has two chunks of unnamed tangle data in it... what do I do with that? It would indicate the message is a (non root) message in two (non-account) tangles right?

Examples from tribes2 : an "group/add-member" message in the second epoch of a group would in ppppp need:

  1. special account tangle to identify the "peer"
  2. epoch account tangle which places the message within the segond epoch membership tangle
  3. group tangle which places it within the total history of the group tangle

If I am looking at a message 1's special position makes it obvious, but how do I distiguish what 2/3 are? I would have to look the tangle root messages up and look at their domains? Maybe the previous massage domains too? Do I ever need to solve this problem in real apps or do I always start queries coming from higher context. This is where I want a worked example to calm my paranoia

Uhhh I'm getting weird flashbacks of willow's capability spec draft. everything is multi author documents... with capabilities. Your triple range for replication is strangely similar to the 3d product in that spec

We only have the special "account" tangle because we expect to have a density of desire to get many things by the same person right?

´´

staltz commented 11 months ago

Yes this could work. Notice that "domain" is kinda becoming the named tangle in this proposed solution.

Thanks! This is the main answer I needed to get from you. :)

(and then do tighter validations, ideally by schema)

Remember p2panda has feeds defined by schema?

I know, and it has been in my mind, but at this point I don't see a need for very tight validations (and as a pre-requisite, a canonical way of representing schemas and hashing them). The "first pass filter" based on a domain string is well enough.

It was - in a scenario where we have a message which has two chunks of unnamed tangle data in it... what do I do with that? It would indicate the message is a (non root) message in two (non-account) tangles right?

I think it will be helpful to set aside private groups (and exclusions and epochs) for now because it's clouding your/mine ability to see PPPPP tangles for what they are, which are different to SSB tangles. Our group epochs design is using SSB primitives (feeds/metafeeds/tangles) to achieve goals, and those same primitives don't translate 1:1 to PPPPP. Tangles are not the same concept, and maybe we should just (in this conversation) call them tongles just to emphasize the difference. Let me explain.

In SSB you have feed as the primary primitive, and tangles as a secondary concept built on top of feeds.

In PPPPP you have tongles as the primary primitive.

So let's talk about what's a "primary primitive". I see it as being "the thing you replicate among friends", or more mathematically, "an evolving set of messages that have a stable and collision-free ID for replication". In SSB we replicate feeds to each other, and tangles are just extra metadata added to weave together a partial ordering. In PPPPP we replicate tongles to each other. The "collision-free" part is important because while it would be nice for my SSB feed to be called @staltz.ed25519, if we would use that for replication, any rando could call themselves @staltz.ed25519 and mess up the replication of my feed. That's why SSB feeds are identified with a hash. Similarly, that's why PPPPP tongles are identified by a hash, not by a human-friendly name.

So in this light, the main difference between a PPPPP tongle and an SSB feed is that tongles are DAGs thus allowed to fork (and merge), while a feed is just a linear sequence of msgs. While an SSB msg uses sequence numbers and a previous field to determine "what comes after what", a PPPPP msg uses backlinks instead. Notice that the only purpose of a backlink in a PPPPP tongle is to provide partial order (we have no sequence numbers), especially notice how lipmaa links point back to stuff just based on graph distance, NOT based on "related content" (unlike SSB msg.content.fork and msg.content.mentions). I.e. the backlinks in a PPPPP tongle are not very semantic, they are just fields that help with replication.

PPPPP "External feeds" (which are tongles!) are the closest thing to SSB "feeds". But now let's talk about an outlier: PPPPP "cross-tongle tongles". Again, a tongle is just an evolving set of messages that have a stable ID for replication, so in case you want to replicate a set of msgs from various authors, we want to support that too. That's the use case for replicating a discussion thread without having to follow/replicate all authors involved in that discussion. So we allow msgs to declare that they belong to a tongle evolving set of messages with a stable ID for replication, via msg.metadata.tangles.

Let's resist the temptation to 1:1 map this to the "epoch tangles", "membership tangles" and instead try to look at PPPPP tongles for what they are, and then think how would we design a new groups system that allows for excluding members. It might end up like what we had, or it might look different.

mixmix commented 11 months ago

I chose that example because it is a rich one we had shared context on

I think I haven't quite grokked the difference between these tangles (ssb, ppppp) because I'm seeing them as similar. I likely need to do some worked examples/ diagrams

staltz commented 11 months ago

Easy suggestion: in the context of PPPPP, forget about "tangles", just think "forkable/mergeable feeds" instead. Much easier to understand what it's about.

staltz commented 11 months ago

A big rename!

I'm considering renaming some concepts such that "tangles" disappears from our vocabulary (it has been confusing enough for SSB veterans, and it's more important that the SSB community doesn't get confused by this, compared to other audiences).

I'm considering calling the "DAG" a "feed" instead, since it's the closest in purpose to the SSB feed, and it has a cryptographic ID which is used during replication protocols, etc. Here's the glossary I got so far:

So the biggest thing is this hierarchy:

Each kind will have their own validation rules (with most rules in common, however):

Examples of meals:

Examples of weaves:


Commentary

I like the naming moot a lot, because it rhymes with root and the dictionary says (bold is my emphasis)

moot debatable; undecided: a moot point; disputable, unsettled

Which is quite accurate, because anyone can recreate a moot msg, so it's "unsettled" and there's no guarantee that the moot's msg.metadata.account author intended to publish this. It only becomes a real thing once that account actually publishes a msg referring to the moot ID.

I am not sure if I like the naming meal, it would be quite new in this realm, and we'll be using this word a lot since meal feeds are "actual content" feeds. I'm open to suggestions, but I'm looking for a short and simple word, not an acronym, not a composite word like "moot feed". We're going to be talking about this kind of feed a lot, so the best would be 1 syllable or 2.

I like weave, it has a connotation of blending different strings together, which is apt.

Curious about your thoughts @ahdinosaur @mixmix

staltz commented 11 months ago

Considering "meed" as a portmanteau of "feed" and "moot", and it actually has a dictionary meaning:

A meed is a well-deserved compensation or reward. At a birthday party, every guest hopes to gather his or her meed of candy from the piñata they've worked so hard to smash open.

The noun meed is a very old fashioned way to talk about a payment or share of something. You're most likely to come across it in older books, but you might want to use it to describe the way your grandmother manages to give each of her twelve grandchildren a meed of her attention and love. Meed comes from the Old English root mēd, which has a Proto-Indo-European root in common with the Greek misthos, or "reward."

It's a bit exotic, but has the advantage of being a portmanteau and recalling both feed and moot concepts.

ahdinosaur commented 11 months ago

i for one really like the name "tangle", to me is a fun way to describe a DAG.

in my view, there's a very very limited number of people who actually understand SSB tangles. i think if you are clear about what a PPPPP tangle is, i personally don't see an issue with the SSB overlap.

an alternative glossary if i understand this right:

hierarchy:

staltz commented 11 months ago

@ahdinosaur yeah you might be right, I shouldn't go too exotic with "meed". I can already foresee people joking about "weed" and "mood" too.

I like Tangle+Account+Feed+Root+Moot.

The main problem is helping Mix understand how PPPPP Tangles are not SSB Tangles. 😅

mixmix commented 11 months ago

how about "doot" instead of "root" - deterministic root

droot

mixmix commented 11 months ago

@staltz is there any constraint about who can add messages to different tangles?

e.g. an account tangle can only be validly appended to by author keys that have been given permission to right?

staltz commented 11 months ago

@mixmix

An account tangle can only be appended to (i.e. msgs published on it are only considered valid) if the msg's pubkey is authorized to do that action on the account, e.g. add power: is the pubkey authorized to add other pubkeys?

On feed tangles, only the feed's account is authorized to publish on it. Note how the moot is deterministically defined by two data: account and domain, so the account there determines who can publish. Also note that for "commons feeds" you can set account='any' in the moot and that's a special case that means anyone can publish on this feed.

And on weave tangles, anyone is allowed to publish on it.

staltz commented 11 months ago

Closing this issue because both things were resolved: the renaming and the proposal to do rootMsgId × range × domain during replication.