Closed arj03 closed 3 years ago
Thanks @staltz. I just pushed up what I had to get a better overview. I'm writing more tests to see if this works properly for bendy butt :)
Excited to see this. I know you're not asking for reviews yet, I'm just highlighting some typos that seem obvious and may help make your ride less bumpy
Thanks, appreciated :)
~There is something fishy going on in db2/compat/ebt.js~
All green :)
Probably this was discussed in your recent video call, but what's the rationale behind introducing replicateFormat
?
The rationale behind replicateFormat
is that we were thinking about how to handle "old" EBT clients that only support classic replication. We didn't want to change the version number as that is more tied to the wire format than the options and would break old clients. So we opted to use replicate for classic feed formats and a new replicateFormat
for other feed formats, that way replication with existing client still works fine.
@cryptix I changed the clock to take an object instead to better match the other APIs and make it more future-proof
ssb-meta-feeds could be updated to 0.20.0
@arj03 Note I added two simple commits to this branch, before you bump into local branch collisions
@arj03 Also remember to add the formats
folder to package.json files
array
There is one more thing I'd like to do. Currently if you register indexed then it takes over all classic formats. And I'd like it to only handle the ones you specially register.
Question: why do we need to pass a name when registering a format? That name could be anything (as far as I see), so why not let the internals assign any random name? Or even better, use an array of formats instead of a dict?
I think that was mostly to make it easier to debug by being able to print out the name. Adding a name attribute to the format object would serve a similar purpose and make things simpler by just having an array.
Yes! Name field on the format object
Some comments on the last changes:
I changed the structure so that ebts are an array now with names. The API seems better, less possibility to do something wrong :)
I changed the sliced example because I found out that in order for the existing to work, then both ends needs to have a feed registered for sliced replication over a new EBT. That's a bit of a tall order, so instead what I did was to change one end (carol) so that she is responsible of what needs to be sliced replicated and what needs to be replicated in full. This way alice doesn't need to know. This is also why I didn't put that sliced replication in formats.
Last I changed request
and block
so that you can provide the exact ebt where this action needs to happen. Lets say you have a pub and you want do enable indexed replication, but you don't yourself want to do indexed replication. You instead want to replicate the indexes as normal feeds. Before that was no way to do this, a feed would either be in indexed or classic. With the ability to specify the format this is now possible. I changed the default to be first ebt that matches isFeed. This means that for the indexed example you have to provide the formatName (see test/formats).
Hey @mixmix we've been working on this PR together for https://github.com/ssb-ngi-pointer/ssb-replication-scheduler/pull/5 and there's lots of changes going on, I thought it might be good to get a (your) third perspective on it.
Currently, IMO this branch is ready to merge, but we'll test it out in https://github.com/ssb-ngi-pointer/ssb-replication-scheduler/pull/5 for the next couple days (probably merging earliest next week) and see if it's stable and doesn't present new problems.
Note also that these EBT changes are also implemented in go-ssb https://github.com/cryptoscope/ssb/pull/111/files so keep in mind that changing the "protocol" affects both JS and Go implementation.
I'm confident in your abilities and care. I'll have a small look but not an expert in this area
On Fri, 17 Sep 2021, 06:14 André Staltz, @.***> wrote:
Hey @mixmix https://github.com/mixmix we've been working on this PR together for ssb-ngi-pointer/ssb-replication-scheduler#5 https://github.com/ssb-ngi-pointer/ssb-replication-scheduler/pull/5 and there's lots of changes going on, I thought it might be good to get a (your) third perspective on it.
Currently, IMO this branch is ready to merge, but we'll test it out in ssb-ngi-pointer/ssb-replication-scheduler#5 https://github.com/ssb-ngi-pointer/ssb-replication-scheduler/pull/5 for the next couple days (probably merging earliest next week) and see if it's stable and doesn't present new problems.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ssbc/ssb-ebt/pull/52#issuecomment-921133480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUK3HXYQUCDGD25BZ365GLUCIXWRANCNFSM5DP6WUQA .
@arj03 I was working on handling blocks in ssb-replication-scheduler and here are some thoughts about a corner case:
e-b-t's state
object has state.peers
which gets updated with the peers connected via replicateFormat
, which uses rpc.id
from SHS. So e-b-t is tracking the "follow graph" but index feeds and meta feeds are never "origins" of social graph edges, they are always the destination.
This means that the current implementation of block()
here is a bit confusing. Suppose origFeedId is classic but destFeedId is bendy butt, then the found ebt
will be the classic, and then ebt.isFeed(destFeedId)
will be false:
function block(origFeedId, destFeedId, blocking, formatName) {
initialized.promise.then(() => {
const ebt = findEBTForFeed(origFeedId, formatName)
ebt.prepareForIsFeed(destFeedId, () => {
if (!ebt.isFeed(origFeedId)) return
if (!ebt.isFeed(destFeedId)) return
if (blocking) {
ebt.block(origFeedId, destFeedId, true)
} else if (
ebt.state.blocks[origFeedId] &&
ebt.state.blocks[origFeedId][destFeedId]
) {
// only update unblock if they were already blocked
ebt.block(origFeedId, destFeedId, false)
}
})
})
}
I wonder if we should just use the strong alternative, like:
ebts.forEach((ebt) => {
ebt.block(origFeedId, destFeedId, true)
})
Because the only thing that e-b-t block() does is update a lookup table so to support isBlocked and isShared. I think this setNotes will never be triggered because it checks if a connected SHS peer is the blocked target, and for meta feeds and index feeds, that'll never be the case.
Ok, read the readme. My only question is a naive one, but might be useful - is this the right repo for this complexity?
Motivation of that question : I don't know if this is adding a lot more bells and whistles to ebt. If it is should that complexity live here, of be provided in some other way? I'm thinking about maintenance and being able to learn to use and understand this if I need to
Right it doesn't work properly now. We should specify in the README that origFeedId
should be the main id of the blocker. And use isFeed from classic to check origFeedId
and let findEBTForFeed
use destFeedId
. This leaves it the responsibility of the caller to block the same formats that was also requested. What do you think about that?
Motivation of that question : I don't know if this is adding a lot more bells and whistles to ebt. If it is should that complexity live here, of be provided in some other way?
@mixmix Good questions, we had some concerns where should the formats/*.js
files live, it wasn't obvious where to put those. Our best options were this (what we're doing now in this branch) OR new modules such as ssb-ebt-classic
, ssb-ebt-bendybutt
, ssb-ebt-indexed
which would be plugins for ssb-ebt, but it felt like just tucking the files in arbitrary new modules.
On the other hand, ssb-ebt used to be just a light wrapper around epidemic-broadcast-trees, and I think the philosophy is still the same, but now ssb-ebt can handle multiple instances of epidemic-broadcast-trees. I think overall ssb-ebt is small and has a clear responsibility (provide glues between epidemic-broadcast-trees and SSB feeds and database calls), even after this PR.
And use isFeed from classic to check
origFeedId
and letfindEBTForFeed
usedestFeedId
. This leaves it the responsibility of the caller to block the same formats that was also requested. What do you think about that?
@arj03 Should work, let's try that.
@arj03 I found another small issue with blocks:
This line https://github.com/ssbc/ssb-ebt/blob/2fe87910ca37a8dc50d6ad7458617885a3f51e48/index.js#L216 is causing a bug, I think the culprit is the next-tick Promise .then() behavior (yay!).
There's a simple test in ssb-replication-scheduler (test/integration/block3.js
) that hasn't been passing and the cause is that line. If I comment it out, it works.
I think it's a race condition. Alice's database gets appended with a new block message, which then updates clock in ssb-ebt, and should ASAP also call ssb-ebt block() and prevent that message from being sent to Bob (now blocked) but I think the block() is too slow (because of the promise) so Bob ends up getting the message.
Obviously we could fix that by using raw callbacks that don't have forced-next-tick async behavior, but it seems like a gotcha and maybe a better solution is adding an explicit wait mechanism for blocks to be computed before sending new info to remote peers. This way we don't need the block part to run ASAP, it is allowed to be async.
Nice find.
That really seems quite brittle overall, even before. You are talking about changing sbot.post
to process blocks differently? What I find strange is that they both have this initialized.promise.then
, it's not the prepareForIsFeed in block? Mostly trying to understand.
What I mean with brittle is that the post comes from db2 as the message is appended, while the blocking comes from ssb-friends (an index). Adding more of these kind of synchonizations between different modules feels like we are going back into the bermuda again. Hmm.
You are talking about changing
sbot.post
to process blocks differently? What I find strange is that they both have thisinitialized.promise.then
, it's not the prepareForIsFeed in block? Mostly trying to understand.
Oh yeah, I changed the implementation of block() so that it would do ebts.forEach(ebt => ebt.block(etc))
(because I was desperate to make it work) and the only thing wrapping that was the promise.then(), when I removed the promise.then(), then the forEach block worked. I assume that the prepareForIsFeed
could also add a slow async check, but on the other hand I think that prepareForIsFeed
for the classic is synchronous because it does the cb() call immediately.
Adding more of these kind of synchonizations between different modules feels like we are going back into the bermuda again. Hmm.
Very true, I didn't think of it like that. Maybe there is another way. Perhaps it's not a showstopper that Bob gets a few messages before Alice's block kicks in. And/Or perhaps it's okay to do a debounce or delay on ssb-ebt ssb.post
so that local database plugins (like ssb-friends) take priority first.
Update: another idea is that we put these kind of race condition solutions in ssb-replication-scheduler (which is THE module for gluing all these things together). Maybe we could add an obv
(similar to the post
obv) to ssb-ebt and ssb-friends so that ssb-replication-scheduler knows the latest seq of both of them, and then explicitly pauses ssb-ebt before it has caught up with ssb-friends.
I like the last solution a lot better 👍
For that, we will need some way of feeding new db messages to ssb-ebt to replace sbot.post, or we need pause/resume in ssb-ebt
For future travelers the problem described above should have been solved in https://github.com/ssbc/ssb-ebt/pull/55
Been testing this branch in the 8k demo and it has been working fine. I wonder if we should release this as 8.0 to try and reduce the number of dominoes we have left or we should wait for further testing of the replication-scheduler PR?
Hmmm, I feel like we should test a bit more. There's anyway just two branches to babysit: this one and the replication-scheduler, not that many.
formats/indexed.js needs to handle bad data in getMsgAuthor (not an array)
@arj03 I think this PR is okay to merge and release, we haven't bumped into errors recently. :)
Yeah it is looking good with latest netsim runs. I'll let glyph get us the last numbers and then merge and release this as 9.0.0
This PR is a bigger change to the core of this module so should be reviewed carefully.
Motivation
We would like ssb-ebt to support multiple feed formats which is needed for meta feeds. Not only that, we would also like to allow other ways of replicating data, such as indexed replication where multiple messages are sent together for partial replication.
Secondly we would like add the ability to know the clock of a remote peer. This information is already available during the notes transfer of the EBT protocol, but here we would like to split that information out in order to allow a peer to use that information for sliced replication (like the latest 100 messages).
Implementation
Before there was only 1 EBT process, here we add the ability to register multiple EBTs and let them decide what feeds they are interested in replicating using the
isFeed
method. A new format must implement a few methods that is used both in this module and inside the EBT module. While EBT already was refactored to support multiple formats including binary (this PR does not depend on any changes to EBT) the format describes the methods needed in one place.New RPC methods: 'replicateFormat', 'clock'. Note we added
replicateFormat
used for new formats and left the existingreplicate
intact for classic replication for backwards compatibility.Related
Go changes: https://github.com/cryptoscope/ssb/pull/111
Another related issue: https://github.com/ssb-ngi-pointer/ssb-subset-replication-spec/issues/9