Closed staltz closed 2 years ago
This looks great. I really like how this simplifies a lot of things. Both in the formats folder but also in the decoupling of this module from both metafeeds and ssb-subset-ql (you should be able to remove that one from deps) concerns. These simplifications (as they most often are) should also make index messages lighter overall. I can help test some of that including classic and buttwoo once we can there.
From the description, it could be a bit easier to understand if you in this:
"indexed" EBT format will be based on a new feed format which is a bit of the classic format in disguise. The "nativeMsg" in this feed format is [msgVal, payload].
Added: msgVal is the index message and payload is the indexed message, or something like that.
One more note is that I'm assuming the validate in index feeds format to be a lot simpler (than what we had in ebt here before) because the URI for index feeds to be index / main. So you have the main feed in there. Doesn't really validate exactly that you are matching the ql on the metafeed, but maybe that is okay?
Thanks for taking a look! Yeah, I also like it that there are less APIs and less code.
What are your thoughts on putting this new feed format in ssb-index-feed-writer? If someone wants to replicate index feeds without writing their own index feeds, they can just install the plugin and not start()
it.
Added: msgVal is the index message and payload is the indexed message, or something like that.
Yeah, I would appreciate a better naming. Maybe [indexMsg, indexedMsg]
. I just went with [msgVal, payload]
because that's what we had before in the code.
URI for index feeds to be index / main
Actually the URI is ssb:feed/indexed-v1/____
without the main feed ID. I figured that it was not necessary, because to look up the indexed msg we don't need the main feed ID, we can just use indexMsg.value.content.indexed.key
, i.e. we can just use ssb.db.get
we don't need getAtSequence([author, sequence], cb)
.
What are your thoughts on putting this new feed format in ssb-index-feed-writer? If someone wants to replicate index feeds without writing their own index feeds, they can just install the plugin and not
start()
it.
Yeah that should be fine I think. Probably better as it's one less module.
URI for index feeds to be index / main
Actually the URI is
ssb:feed/indexed-v1/____
without the main feed ID. I figured that it was not necessary, because to look up the indexed msg we don't need the main feed ID, we can just useindexMsg.value.content.indexed.key
, i.e. we can just usessb.db.get
we don't needgetAtSequence([author, sequence], cb)
.
Hmm, but what if someone cheats and indexes something else, like some other feed? Before we used the ql to determine if you indexed the correct thing. This doesn't have to be malicious, could also just be a coding mistake ;-) The only place were you need to do this checking is in validate. I just wanted to avoid moving the same metafeed stuff we had here into validation. So this was why I said maybe it's okay to relax some of these requirements?
because to look up the indexed msg ... we can just use
indexMsg.value.content.indexed.key
yes this is a nice improvement
Interesting, this is an aspect of index feeds that I haven't realized before.
Before, msgs in the index feed wouldn't actually guarantee that the QL "type" matched correctly, they only guaranteed that the QL "author" matched correctly, because we did author+sequence. This already meant that you're not sure what you're going to get, but you're sure it's at least from the correct author. I.e. it could be an about
index but you end up getting vote
s.
Now, msgs in the index feed don't guarantee that even the author is correct.
This has two interesting conclusions:
indexMsg.value.content.indexed.sequence
since it's not used at all. Then the indexMsg might get even smaller, lower overhead.On the topic of "guaranteeing" that the indexed message is correct, it's kind of hard/impossible to know beforehand that you're going to get the correct thing. You would basically have to have the whole indexed message, check its contents, generate the msgId and compare with indexMsg.value.content.indexed.key
.
Idea: what if we validate this when the message is replicated but before it's put in the database with addTransaction?
Idea: since we don't have guarantees, what if we use index feeds to "bookmark" other people's content? Is that what "claims" were?
We could use it in a number of interesting ways for partial replication:
We could drop the indexMsg.value.content.indexed.sequence since it's not used at all. Then the indexMsg might get even smaller, lower overhead.
👍
which means we need to trust the person using the main feed
Yeah, that was my thinking as well
what if we validate this when the message is replicated but before it's put in the database with addTransaction
Yeah, we need to validate a lot of these things anyway, like the key so it's not a it's a big overhead. So then the index message would only be the key.
Idea: since we don't have guarantees, what if we use index feeds to "bookmark" other people's content? Is that what "claims" were?
Exactly. The idea was that we know that certain people probably will never get index feeds, so we help the network by writing those indexes. And then you could use them if lets say they were written by someone you follow directly.
in a number of interesting ways
Yep, currated content :-)
The claim idea also had to extra (optional) steps that someone else validated indexes and could post "yep this is legit". But in hinsight, maybe these things can quickly before very complicated so lets instead lean more towards our already existing hops trust metric.
Okay, so I'll move this EBT formats project forward and make the other changes in the other modules, it'll take some time though because I'm trying to work on fixing Force reindex in Manyverse.
Claims
Yeah we're on the same page. As you can guess I'm not going to work on this now, but thought it was convenient to share the idea now that the topic is fresh in our minds.
This PR is now passing all tests :)
@arj03 This PR is fully ready now!
@arj03 Can has final review?
Context
I'm about to work on autosharding in ssb-meta-feeds and I'm looking at all the other modules that use ssb-meta-feeds and what kind of changes they will need, in ssb-replication-scheduler, ssb-ebt, ssb-index-feed-writer.
Problem
We could implement index feeds better in such a way that requires less APIs from ssb-meta-feeds and less "ebt format" methods.
Solution
This PR is a huge draft. It depends on changes in a couple other modules (ssb-bfe-spec, ssb-bfe, ssb-uri2, ssb-keys, ssb-index-feed-writer, ssb-meta-feeds) as well as updating the specs (the "index feeds" spec). The state of this PR now is just "I want feedback whether this is a good direction" and then if it looks good I'll work on all those other methods.
The tests don't pass here in CI because they would require all those other modules to be changed, BUT all ssb-ebt tests are passing for me locally because I just hacked my node_modules with changes.
Overview
:gem: "EBT formats" are now an extension of "feed formats". Pass a feed format to
ebtFormatFrom
, and it'll massage it a bit and spit out an "EBT format". This vastly simplifies the implementation offormats/classic
,formats/bendy-butt
, andformats/buttwoo
.:gem: Removed methods
prepareForIsFeed
,convertMsg
,isReady
.:gem: "indexed" EBT format will be based on a new feed format which is a bit of the
classic
format in disguise. The "nativeMsg" in this feed format is[msgVal, payload]
. The main differences between this new design and the old index feeds is just themsgVal.author
and msgId, which are both SSB URIs, e.g.ssb:feed/indexed-v1/...
andssb:message/indexed-v1/...
.Changes needed in...
indexed-v1
: https://github.com/ssbc/ssb-bfe-spec/pull/25generate()
needs to supportindexed-v1
: https://github.com/ssbc/ssb-keys/pull/100indexed-v1
feed format: https://github.com/ssbc/ssb-meta-feeds/pull/71