Open KyleMaas opened 2 years ago
Oh, and I should note that the pub's config.toml has this:
# Enable syncing by using epidemic-broadcast-trees (EBT)
enable-ebt = true
@KyleMaas Yep, that is a known issue (see https://github.com/ssbc/go-ssb/blob/master/docs/faq.md#can-go-ssb-replicate-with-manyverse for the full deets). The hope is that we can work towards fixing this in the near future, it would be a great boost for interop with Manyverse which a lot of folks want/need/expect. Opened https://github.com/ssbc/go-ssb/pull/230 to converge more faq/bugs docs into one page, hopefully more discoverable...
Actually better to keep this open and name it more generally? Helpful for other folks & is an outstanding issue...
Ah, yep, I didn't see that. Thanks!
So, looking through the docs for EBT/Manyverse, I'm not seeing a whole lot of changes that have been made in the Planetary fork relating to EBT. So if it works there, is there any reason some of those changes couldn't be cherry-picked and brought over here?
@KyleMaas The EBT changes in the Planetary fork were experimental and I did try to merge them in via https://github.com/ssbc/go-ssb/pull/184#issuecomment-1297091288 but ran out of steam with more broken tests (we have ~15 skipped tests and ~5 flaky tests in the test suite already). I still don't understand exactly how EBT is broken in go-ssb
but I do intend to find out. More news as I have it. If you do any experiments, please share what you find :+1:
I am also planning on deepening my understanding of EBT. Hopefully we can figure this out together and get it replicating reliably.
@decentral1se
Is there an issue filed listing the skipped and flaky tests so they could be debugged? Might be something I could work on if I knew what needed to be done.
@KyleMaas thanks, the skipped tests are being tracked on https://github.com/ssbc/go-ssb/issues/169 and the flaky tests have still to be identified. I've been avoiding cataloguing so far but if you want to take a run at this, it'd be great. Any failure listed on https://github.com/ssbc/go-ssb/actions in the recent past is most likely a flaky test. I'll open up an issue for this now.
When I attach Manyverse to go-ssb as a pub, I get the following error in Manyverse's log:
peer @[[pub ID]].ed25519 does not support RPC ebt.replicate
@KyleMaas how do you access Manyverse's debug log?
It's been quite a while, so I don't remember for sure, but I believe that was showing up on the console when I ran the Linux Manyverse client from a terminal.
@stevenroose I got your email:
I'm trying to debug/fix the interaction between Manyverse and the go-ssb based pub server. The go-ssb maintainer has mentioned that the issue might be on their side, but it's really hard to know what's going on on the Manyverse side without debug logging.
Could you maybe take a quick look at the issues I created related to this or just here confirm that Manyverse is expected to work with pub servers? (I have tried to use go-ssb and the ssb-server JS implementation and both without success.)
I'm a Go dev but not really a JS dev anymore. But who knows. If I can pinpoint what's going wrong, I might be able to fix something. Or at least find out what's going on.
There are 2 issues going on:
There are 2 replication mechanisms in SSB: createHistoryStream RPC calls, and EBT (Epidemic Broadcast Trees). In JS, the latter is implemented by https://github.com/ssbc/ssb-replicate/blob/master/legacy.js and the former is implemented by https://github.com/ssbc/ssb-ebt . EBT is the modern method, and is vastly better for network performance, see this talk by Dominic Tarr some 5 years ago: https://www.youtube.com/watch?v=GN57bs1eAck Also, ssb-ebt was implemented some 5 years ago as well.
Some months (years?) ago we dropped support for createHistoryStream replication in Manyverse, the primary reason being that createHistoryStream was truly horrible for performance and user experience, and that most apps had EBT already enabled, such as Patchwork and Oasis. So using only EBT in Manyverse worked well since it could replicate with Patchwork and Oasis, and there aren't a lot of other implementations of SSB used in production.
go-ssb began receiving support for EBT during the ssb-ngi-pointer project (2020–2021), but there are still a few rough edges in go-ssb's EBT that need to be fixed before it's good for production. I believe those are the issues you're encountering.
To enable logs in Manyverse to debug EBT, do the following:
src/backend/ssb.ts
to add a config field ebt: {logging: true},
above the line replicationScheduler: {
That would be https://gitlab.com/staltz/manyverse/-/issues/1824 and I suspect it's because ssb-server is running an outdated version of ssb-ebt (a previous version of ssb-ebt has notoriously been sometimes "stuck" not replicating feeds it should).
@staltz
Thanks for the clarification!
@staltz tysm :relieved:
@stevenroose If you do end up diving in and running into various issues, I'd point you at https://github.com/ssbc/go-ssb/issues/237 which is my current focus - trying to understand why various tests are failing and how to fix them. Some of the broken-ness here might overlap, so if you have any insights, please do share.
Notes from P2P Basel & chatting with @boreq about how to fix stuff:
Couple of commits from https://github.com/planetary-social/ssb/commits/fork made it work again, removing the caching logic, replacing the local frontier logic with "get everything from the social graph", removing the block on retrieving own feed and maybe something else I'm forgetting. those commits seem to be https://github.com/planetary-social/ssb/commit/471bad01eccadaa99a0156c510a093edebff13c5 https://github.com/planetary-social/ssb/commit/c7dc092193fc4cd3bb1c563ee9c1c6556582d253 https://github.com/planetary-social/ssb/commit/05ca91cf409d98abd5f386d798ea003c909422d8 - FYI I have tried to backport this stuff but tests started breaking and we have alreadya lot of flaky tests, so I backed off. It might be worth a try re-working these commits or else diving deeper to understand the real causes and fixing the code as it was originally intended.
Caching logic seems to have an issue which can cause data corruption
New peers are somehow not always included in the EBT logic and a work-around was to disconnect everyone and re-connect to make it work
"negotiation" may be broken (can't remember exact details)
Once scuttlego goes into production, we can learn from the EBT replication implementation https://github.com/planetary-social/scuttlego/tree/main/service/domain/replication/ebt and see how to coordinate. Work is ongoing to make it all work and roll things out Planetary side.
EBT docs so far on http://dev.planetary.social/replication :clap:
The updating of the EBT matrix seems to happen in https://github.com/decentral1se/ssb/blob/master/multilogs/combined.go which is connected with when indexing happens. The update only happens when indexing is triggered? That might be an issue.
mix/matt recently were working on a fix in JS ebt implementation which might be relevant to ask about
Caching logic seems to have an issue which can cause data corruption
Yeah, I saw situations where something was wrongly cached and then the logic which loads cached data and determines what to ask for would always decide that we don't need to ask for anything.
New peers are somehow not always included in the EBT logic and a work-around was to disconnect everyone and re-connect to make it work
Basically there is no code that sends new notes when the social graph changes. The notes are sent only only once when the EBT session is being created. They are not updated afterwards if e.g. we follow a new feed and we want to replicate it.
The updating of the EBT matrix seems to happen in https://github.com/decentral1se/ssb/blob/master/multilogs/combined.go which is connected with when indexing happens. The update only happens when indexing is triggered? That might be an issue.
The combined index seems to only update the ebt state (or whatever that was called) when we actually get a message from a particular feed. This means that if that combined index is used for determining which feeds to replicate in ebt logic then we only pull in feeds that we received messages from. I think the edge case was that when starting with an empty repo we would never replicate any feeds using EBTs?
This seems wrong to me as I understand that we want to replicate based on social graph? That is why I completely dropped using that combined index for EBT and use the social graph instead when trying to fix EBTs in go-ssb.
Great to see some progress on this!
@boreq As I've discovered with #274, the social graph system is really, really broken. I'm working on trying to find the core issue of that. So far it seems it's not due to a race condition but seems to be stemming from problems with indexing within the builder. Sounds like that may also be helpful for your EBT work.
Related https://github.com/ssbc/go-ssb/pull/72
Also %Okyc+tVgyep+1ccI8nUZbBpYiXUvUBgQPOpfnZFRXQQ=.sha256
is a new EBT spec writing effort from @gpicron and friends!
When I attach Manyverse to go-ssb as a pub, I get the following error in Manyverse's log: