Closed staltz closed 2 years ago
I'll put all my input from your meeting here :)
Contrast that with bottom-up partial replication approaches like tangle-sync that don't require content to be organized into section and subsections
Tangle-sync has the same kind of "has to be designed carefully" problem. Like right now we are in trouble because self assigned about messages are of tangle: none. And you could easily have tangles that are huge so you still need to think about sub-tangles. In both cases these things needs to be designed carefully so that we try to be optimal and this can be hard because we don't exactly know what we are optimizing for (yet). This was one of the reasons for claims (that evolved into indexes), so that we can attach this information after the fact. This has other problems, claims about trust, indexes about overhead. It's all trade-offs. Just wanted to put some more context here.
EBT clock
I'm not super worried about this. We already handles at least on the order of thousands of feeds. Meta feeds will add a significant overhead to this, at least 5-10x and if we have many active groups then even more. But on the other hand EBT is designed to handle this case really well by having state of each side to make sure that we only sync clock diffs. Secondly we have the ability to tombstone feeds. This is really nice, because that allows us to prune the tree and can counter some of the "bad design" decisions.
In any case, maybe lets test this to make sure our assumptions are correct :)
Proposal
I really like this section
I would even recommend that we start with the domains
I would say just define the 4 we need now: main, indexes, groups, social
And instead put in a process that makes it easy and clear how to add things.
If you want an analogy with actual trees 🌳, then the root is super hard to remove, the trunk is solid and stable, the branches are somewhat hard but malleable, and the leafs are soft and very malleable.
👍
sympathetic replication:
I'm leaning towards we give other people a metafeed read key (that is different from group read key). This means that things are explicit and for the storage part of SOS we can say things like: sympathetic storage: X, hops 1: Y, hops 2: Z, groups not within hops: T.
EBT clock ... Secondly we have the ability to tombstone feeds.
Oh yes, tombstoning changes the game!
Love this description.
One question that came to me is "what is a domain". We have to define this to be able to have a process by which we reasonably permit more of them.
I'm particular "groups" as a domain feels weird somehow... In that you might be storing a social app (like ahau) in there... So why is it not in the social apps domain. Should "groups". Be a pattern that different domains might choose to implement.
That might depend on how much privacy you want. Like people seeing that you are announcing a group subfeed under "social" versus "games" gives you info.
So... I like the idea of a minimal root feed, but I think we need to work on clarifying what domains means... And what you do when you have an app which spans public and group spaces. I am gonna ask Holochain people about this
After a meeting with arj: use first byte subfeeds in order to avoid the domain feed getting too large. With 1st-byte-sharding (similar to how ssb-blobs stores files in the filesystem), we make sure that each domain feed has maximum 256 messages. It's also reasonable to assume that a single user doesn't have a huge amount of apps, so we don't need an additional 2nd-byte-sharding layer. That would only be necessary only, if a single user needed 256*256 = 65k different app-specific feeds, which sounds unlikely.
Our library, ssb-meta-feeds, can automatically include 1st-byte-sharding under each domain.
I'm digging this. Thoughts, @mixmix ?
graph LR;
root:::depth0
main:::depth1
root-->main:::depth1
root-->indexes:::depth1
indexes:::depth1--> indexAbout:::depth2
indexes:::depth1--> indexContact:::depth2
root-->groups:::depth1
groups-->f3:::depth2-->groupMoms:::depth2
groups-->18:::depth2-->groupBatts:::depth2
groupMoms-->groupMomsEpoch1:::depth3
groupMoms-->groupMomsEpoch2:::depth3
groupBatts-->groupBattsEpoch1:::depth3
root-->social:::depth1
social-->4d:::depth2-->contact:::depth2
social-->08:::depth2-->gatherings:::depth2
gatherings-->gatheringsv1:::depth3
gatherings-->gatheringsv2:::depth3
root-->games:::depth1
games-->7a:::depth2-->chess:::depth2
chess-->chessv1:::depth3
chess-->chessv2:::depth3
games-->33:::depth2-->tictactoe:::depth2
root-->collab:::depth1
collab-->12:::depth2-->spreadsheets:::depth2
collab-->cf:::depth2-->whiteboard:::depth2
classDef default fill:#eee, stroke:#999;
classDef depth0 fill:#6b4b35,stroke:#3d291c,color:#fff;
classDef depth1 fill:#fcb56f,stroke:#d68a3e;
classDef depth2 fill:#fcfc6f,stroke:#cccc2c;
classDef depth3 fill:#b5fc6f,stroke:#75c623;
One problem with first-byte sharding is that it only works if you know what you're looking for. E.g. if you have a groupId, then yes you can take the 1st byte of the groupId and you know what you're looking for when replicating someone's tree.
But in other cases like "games" where you want to replicate someone's chess feed or tictactoe feed, you don't know what the feed ID for those are.
@mixmix proposed that you could take a hash of the feedpurpose
and then use the 1st byte of that hash, for sharding.
Hmm, I'm wondering if there is room between hard coded top level domains and open-ended/infinite spam. Could one of the top level domains be something like a feed-plugin
domain, where developers could add feed domains without spec changes? this might help with the naive usage case since a plugin would need to be added intentionally.
I do like the four top-level domains that @arj03 called out as being a good starters!
@austinfrey good timing, we had a meeting this week where we considered a fully flat tree (only one root) and the pros and cons of that.
graph TB
v1-->A-->1 & 5 & 9
v1-->B-->2 & 7
v1-->C-->3 & 6
v1-->D-->4 & 8 & 10
classDef need fill:#ee0, stroke:#cc0;
class v1,1,5,6,7 need
class A,B,C need
some more analysis: https://hackmd.io/44jmCcbKRVu8ZSyKxe9O1Q
here 1,2,3,...10 are app subfeeds, and A,B,C... are the shards. What we discussed last night in a call is that each app should decide a "uniqueId" (which could be the app name, or a groupId), then it decides which shard its subfeed lives under by doing:
// pseudocode
hash(concat(v1FeedId, uniqueId)).slice(0,1)
Context
Metafeeds are meant for partial replication, so if (e.g.) the root metafeed ends up with thousands of messages, that's going to be very unfortunate for partial replication. The challenge is that metafeeds are a top-down approach to partial replication, it's about organizing content such that it can be easily split into areas of relevance. Contrast that with bottom-up partial replication approaches like tangle-sync that don't require content to be organized into section and subsections.
To maximize the utility for partial replication, the ROOT metafeed should have the smallest number of messages published on it.
Challenges
Wide tree
We can't design against the malicious use case, i.e. Alice posts thousands of messages on her root metafeed on purpose. We can't prevent that from happening. But what we should do is mitigate against naive usages of metafeeds. E.g. suppose an app developer is naive and is just experimenting with making subfeeds for their new app
fart
, but due to a coding mistake or something, they end up creating one subfeed for every session the user has with the app.:bomb: This is bad, we shouldn't allow too many messages published on the root metafeed.
That's probably one of the worst cases, and it's definitely possible with the current API we provide in
ssb-meta-feed
.Deep tree
The other extreme is if we make the tree as deep as possible. We would split the feeds into deeply nested categories:
The problem with this is that in order to replicate the entire session1, you need to inform EBT of
O(depth)
feeds. This multiplied by the number of "persons" in the social network you follow,followedPersons < totalPersons
meansfollowedPersons * O(depth)
feed IDs to manage in EBT just for one app.Multiplied by the number of apps you "subscribe to" (or are interested in) where
subscribedApps < totalApps
, thus it means EBT would have to handlesubscribedApps * followedPersons * O(depth)
feed IDs in the EBTclock
. There is a possibility that this adds too much work to EBT and creates more work in CPU, heavier payloads to pass back and forth over network streams, etc.:bomb: This is also bad, we shouldn't stress EBT with too many feed IDs to keep track of.
It's reasonable to assume that
totalApps
is ~100, so maybesubscribedApps
is ~10 and it's reasonable to assume thatfollowedPersons
is >=150. So that's a factor of 1500. Which makes the choice ofdepth
quite sensitive. There is a big difference between 3k feeds (depth=2) managed by EBT, and 12k feeds managed by EBT (depth=8).Proposal
It's clear we need a middle ground. We shouldn't spam the root metafeed, but we shouldn't force the tree to be as deep as possible either.
I'm thinking that we could add a strict schema for depth=1 subfeeds, and then add guidelines for depth=2 subfeeds and allow anything for depth=3 subfeeds. This way it gets progressively looser the more depth you have, and apps can choose to design their
depth>=3
subfeeds in a way that they are responsible for EBT performance.Example:
This way the depth of an "actual" feed with "actual" content would be 3 or 4, in the common case. It's possible that it's more, but that's up to the app developer.
Orange "domains" would be defined in a spec document somewhere, and libraries such as
ssb-meta-feeds
could hard-code the domains in the library, such that library users don't even have a choice of creating their own custom domain. To add more domains, it would require changing the library, which in turn requires changing the spec. Of course, anyone can fork and do whatever, but reminder: we are trying to protect the tree from naive usages, not from malicious/intentful usages.I would even recommend that we start with the domains:
This means that everybody's root metafeeds would have only ~6 messages.
If you want an analogy with actual trees :deciduous_tree:, then the root is super hard to remove, the trunk is solid and stable, the branches are somewhat hard but malleable, and the leafs are soft and very malleable.
Thoughts @mixmix @arj03 ?