Managing feed length: subfeeds, compaction

pfrazee commented 10 years ago

Large feeds will slow down edge-creation in the network as newly-introduced nodes will need to receive each others' full histories. Because this threatens the SSB network's efficiency, we should identify strategies to solve this within the core message types.

I suggest we isolate the feeds which represent identities in the network. (This relates to using feeds as a replacement to certs, #5). Identity-feeds would be restricted to a defined set of types; all other message-types would have no meaning. This would allow us to:

Put application-data into subfeeds, reducing the growth-rate of the identity-feeds.
Create a compaction scheme which can operate on the message types defined for the identity feed. (If we allow application-defined types, compaction would not be feasible.)

dominictarr commented 10 years ago

can you make this more concrete? ballpark estimates of bandwidth etc for extra points.

hmm, actually I think I like the idea of application feeds. do we want to have it be the same public key? or to have a different way to id that? maybe that feed can explicitly be a subfeed?

one problem: this means for each node you need to exchange several timestamps for each user, one for each application feed.

idea: use gametheory to force users to obey speed limits. if a node attempts to publish too many messages / day then the other nodes will refuse to pass on the extra messages and wont replicate the noise for a penalty time.

pfrazee commented 10 years ago

ballpark estimates

Let's say a stream averages one message every thirty minutes and that the messages hover around 400 bytes. That's 48mess*.4kb, or ~19.2kb, per day, ~.58mb per month, ~7mb per year. We might expect an average of 100 incoming streams per user, if going by twitter's numbers, which would make 700mb in network data per year.

That's not terrible, but I'd like to see it reduced since verification relies on having the full log. If you could reduce the identity feed to only 1 message a day, for instance, its would become 146kb/year, which is a total non-factor.

do we want to have it be the same public key? or to have a different way to id that? maybe that feed can explicitly be a subfeed?

My first thought was that the identity-feed would announce the public keys of its subfeeds, and that the subfeeds would announce the public keys of their parents.

That said, an alternative might be to use the same public key, but introduce a "substream id" on the messages ("multiplex" it). This would add to the message size, but reduce the number of nodes in the network. It might also remove the need for substream-announcements in the verified log (maybe).

one problem: this means for each node you need to exchange several timestamps for each user, one for each application feed.

Yeah, more nodes in the network. The "multiplex" might be a favorable choice, then.

idea: use gametheory to force users to obey speed limits. if a node attempts to publish too many messages / day then the other nodes will refuse to pass on the extra messages and wont replicate the noise for a penalty time.

That's an idea - stream backpressure.

dominictarr commented 10 years ago

maybe the subfeeds could have names, and their id could be hash(name + '/' + hash(pubkey))

they could contain the same public key in the first message but also have the feed name. you could limit an app to only use a subfeed... or it could use a fresh key. we'll need that for devices anyway - you don't want to loose one device and also loose access to the others!

pfrazee commented 10 years ago

Now that I'm thinking about it, I'm not sure multiplexing solves the need to exchange several timestamps. Would it? I'm not familiar enough with the protocol yet.

maybe the subfeeds could have names, and their id could be hash(name + '/' + hash(pubkey))

Why do the hashing?

you could limit an app to only use a subfeed... or it could use a fresh key. we'll need that for devices anyway - you don't want to loose one device and also loose access to the others!

Yeah, in that case, we'd need to encode device and application in the name, eg to distinguish from "work-pc/fooapp" vs "ipad/fooapp".

dominictarr commented 10 years ago

well currently, the id of a feed is hash(pubkey) so making a subfeed in a similar way would mean less has to change. One benefit here is that you hashes are always the same length and guaranteed unique.

If there are LOTS of apps maybe youd want to group them into feeds... more like ID/work or ID/music

pfrazee commented 10 years ago

RE: hashes, I was conflating "using the same public key" with "using the same ID." Makes sense.

One thought: two different devices may need to act as if they're the same device. For instance, you wouldn't want a different twitter feed for each device; you'd want the two to act like they're both @bob, right? Something else to consider...

dominictarr commented 10 years ago

yes. I think the best way to do that would be to "delegate": post a message that says "this ID is another device I control" which makes followers automatically follow that ID also, and in the interface mostly treat it like the parent key.

dominictarr commented 10 years ago

we'd probably still want to show "@bob's phone" in the ui, because knowing that will be important when we are figuring out what happened after bob's phone is stolen.

bob commented 10 years ago

Who called me?

dominictarr commented 10 years ago

haha, wrong number

bob commented 10 years ago

You, @dominictarr?

bob commented 10 years ago

Haha, ok, I c, leaving you...

pfrazee commented 10 years ago

Hah, hypothetical namespace collision.

Yes, I agree about showing the device name in the UI.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ssbc / ssb-db

Managing feed length: subfeeds, compaction #7