indexed-v1 feed format - Githubissues

staltz commented 2 years ago

Context

I'm about to work on autosharding in ssb-meta-feeds and I'm looking at all the other modules that use ssb-meta-feeds and what kind of changes they will need, in ssb-replication-scheduler, ssb-ebt, ssb-index-feed-writer.

Problem

We could implement index feeds better in such a way that requires less APIs from ssb-meta-feeds and less "ebt format" methods.

Solution

This PR is a huge draft. It depends on changes in a couple other modules (ssb-bfe-spec, ssb-bfe, ssb-uri2, ssb-keys, ssb-index-feed-writer, ssb-meta-feeds) as well as updating the specs (the "index feeds" spec). The state of this PR now is just "I want feedback whether this is a good direction" and then if it looks good I'll work on all those other methods.

The tests don't pass here in CI because they would require all those other modules to be changed, BUT all ssb-ebt tests are passing for me locally because I just hacked my node_modules with changes.

Overview

:gem: "EBT formats" are now an extension of "feed formats". Pass a feed format to ebtFormatFrom, and it'll massage it a bit and spit out an "EBT format". This vastly simplifies the implementation of formats/classic, formats/bendy-butt, and formats/buttwoo.

:gem: Removed methods prepareForIsFeed, convertMsg, isReady.

:gem: "indexed" EBT format will be based on a new feed format which is a bit of the classic format in disguise. The "nativeMsg" in this feed format is [msgVal, payload]. The main differences between this new design and the old index feeds is just the msgVal.author and msgId, which are both SSB URIs, e.g. ssb:feed/indexed-v1/... and ssb:message/indexed-v1/....

Changes needed in...

[x] spec for indexed-v1: https://github.com/ssbc/ssb-secure-partial-replication-spec/pull/14
[x] ssb-bfe-spec: add indexed-v1: https://github.com/ssbc/ssb-bfe-spec/pull/25
[x] ssb-uri-spec for indexed-v1: https://github.com/ssbc/ssb-uri-spec/pull/14
[x] ssb-uri2: https://github.com/ssbc/ssb-uri2/pull/12
[x] ssb-bfe: support the above https://github.com/ssbc/ssb-bfe/pull/38
[x] ssb-keys: generate() needs to support indexed-v1: https://github.com/ssbc/ssb-keys/pull/100
[x] ssb-meta-feeds: support indexed-v1 feed format: https://github.com/ssbc/ssb-meta-feeds/pull/71
[x] ssb-db2: fix addTransaction https://github.com/ssbc/ssb-db2/pull/389
[x] ssb-index-feed-writer: follow spec v1 https://github.com/ssbc/ssb-index-feed-writer/pull/21
[x] Updated this PR

arj03 commented 2 years ago

This looks great. I really like how this simplifies a lot of things. Both in the formats folder but also in the decoupling of this module from both metafeeds and ssb-subset-ql (you should be able to remove that one from deps) concerns. These simplifications (as they most often are) should also make index messages lighter overall. I can help test some of that including classic and buttwoo once we can there.

From the description, it could be a bit easier to understand if you in this:

"indexed" EBT format will be based on a new feed format which is a bit of the classic format in disguise. The "nativeMsg" in this feed format is [msgVal, payload].

Added: msgVal is the index message and payload is the indexed message, or something like that.

arj03 commented 2 years ago

One more note is that I'm assuming the validate in index feeds format to be a lot simpler (than what we had in ebt here before) because the URI for index feeds to be index / main. So you have the main feed in there. Doesn't really validate exactly that you are matching the ql on the metafeed, but maybe that is okay?

staltz commented 2 years ago

Thanks for taking a look! Yeah, I also like it that there are less APIs and less code.

What are your thoughts on putting this new feed format in ssb-index-feed-writer? If someone wants to replicate index feeds without writing their own index feeds, they can just install the plugin and not start() it.

Added: msgVal is the index message and payload is the indexed message, or something like that.

Yeah, I would appreciate a better naming. Maybe [indexMsg, indexedMsg]. I just went with [msgVal, payload] because that's what we had before in the code.

URI for index feeds to be index / main

Actually the URI is ssb:feed/indexed-v1/____ without the main feed ID. I figured that it was not necessary, because to look up the indexed msg we don't need the main feed ID, we can just use indexMsg.value.content.indexed.key, i.e. we can just use ssb.db.get we don't need getAtSequence([author, sequence], cb).

arj03 commented 2 years ago

What are your thoughts on putting this new feed format in ssb-index-feed-writer? If someone wants to replicate index feeds without writing their own index feeds, they can just install the plugin and not start() it.

Yeah that should be fine I think. Probably better as it's one less module.

URI for index feeds to be index / main

Actually the URI is ssb:feed/indexed-v1/____ without the main feed ID. I figured that it was not necessary, because to look up the indexed msg we don't need the main feed ID, we can just use indexMsg.value.content.indexed.key, i.e. we can just use ssb.db.get we don't need getAtSequence([author, sequence], cb).

Hmm, but what if someone cheats and indexes something else, like some other feed? Before we used the ql to determine if you indexed the correct thing. This doesn't have to be malicious, could also just be a coding mistake ;-) The only place were you need to do this checking is in validate. I just wanted to avoid moving the same metafeed stuff we had here into validation. So this was why I said maybe it's okay to relax some of these requirements?

because to look up the indexed msg ... we can just use indexMsg.value.content.indexed.key

yes this is a nice improvement

staltz commented 2 years ago

Interesting, this is an aspect of index feeds that I haven't realized before.

Before, msgs in the index feed wouldn't actually guarantee that the QL "type" matched correctly, they only guaranteed that the QL "author" matched correctly, because we did author+sequence. This already meant that you're not sure what you're going to get, but you're sure it's at least from the correct author. I.e. it could be an about index but you end up getting votes.

Now, msgs in the index feed don't guarantee that even the author is correct.

This has two interesting conclusions:

We could drop the indexMsg.value.content.indexed.sequence since it's not used at all. Then the indexMsg might get even smaller, lower overhead.
We already had low guarantees that we would get the correct indexedMsg, and now we would drop that to zero guarantee. This sounds to me like we just need to trust the index feed, which means we trust the root metafeed, which means we need to trust the person using the main feed. Which to me sounds okay.

On the topic of "guaranteeing" that the indexed message is correct, it's kind of hard/impossible to know beforehand that you're going to get the correct thing. You would basically have to have the whole indexed message, check its contents, generate the msgId and compare with indexMsg.value.content.indexed.key.

Idea: what if we validate this when the message is replicated but before it's put in the database with addTransaction?

staltz commented 2 years ago

Idea: since we don't have guarantees, what if we use index feeds to "bookmark" other people's content? Is that what "claims" were?

We could use it in a number of interesting ways for partial replication:

Bookmark other people's content and inform your friends that these bookmarks are worth replicating. Essentially "seeding" them, or "pinning" them for posterity
If seeding/pinning is employed, we could delete everything that is older than some YYYY-MM-DD except those pinned messages. This would allow storage to shrink over time but still preserving some golden content
This could be useful for onboarding, where a new person replicating with you would get your content plus some select highlights from your friends

arj03 commented 2 years ago

We could drop the indexMsg.value.content.indexed.sequence since it's not used at all. Then the indexMsg might get even smaller, lower overhead.

👍

which means we need to trust the person using the main feed

Yeah, that was my thinking as well

what if we validate this when the message is replicated but before it's put in the database with addTransaction

Yeah, we need to validate a lot of these things anyway, like the key so it's not a it's a big overhead. So then the index message would only be the key.

Idea: since we don't have guarantees, what if we use index feeds to "bookmark" other people's content? Is that what "claims" were?

Exactly. The idea was that we know that certain people probably will never get index feeds, so we help the network by writing those indexes. And then you could use them if lets say they were written by someone you follow directly.

in a number of interesting ways

Yep, currated content :-)

arj03 commented 2 years ago

The claim idea also had to extra (optional) steps that someone else validated indexes and could post "yep this is legit". But in hinsight, maybe these things can quickly before very complicated so lets instead lean more towards our already existing hops trust metric.

staltz commented 2 years ago

Okay, so I'll move this EBT formats project forward and make the other changes in the other modules, it'll take some time though because I'm trying to work on fixing Force reindex in Manyverse.

Claims

Yeah we're on the same page. As you can guess I'm not going to work on this now, but thought it was convenient to share the idea now that the topic is fresh in our minds.

staltz commented 2 years ago

This PR is now passing all tests :)

staltz commented 2 years ago

@arj03 This PR is fully ready now!

staltz commented 2 years ago

@arj03 Can has final review?

ssbc / ssb-ebt

indexed-v1 feed format #71

Context

Problem

Solution

Overview

Changes needed in...