ssbc / go-ssb

Go implementation of ssb (work in progress!)
https://scuttlebutt.nz
165 stars 23 forks source link

EBT replication broken #229

Open KyleMaas opened 2 years ago

KyleMaas commented 2 years ago

When I attach Manyverse to go-ssb as a pub, I get the following error in Manyverse's log:

peer @[[pub ID]].ed25519 does not support RPC ebt.replicate
KyleMaas commented 2 years ago

Oh, and I should note that the pub's config.toml has this:

# Enable syncing by using epidemic-broadcast-trees (EBT)
enable-ebt = true
decentral1se commented 2 years ago

@KyleMaas Yep, that is a known issue (see https://github.com/ssbc/go-ssb/blob/master/docs/faq.md#can-go-ssb-replicate-with-manyverse for the full deets). The hope is that we can work towards fixing this in the near future, it would be a great boost for interop with Manyverse which a lot of folks want/need/expect. Opened https://github.com/ssbc/go-ssb/pull/230 to converge more faq/bugs docs into one page, hopefully more discoverable...

decentral1se commented 2 years ago

Actually better to keep this open and name it more generally? Helpful for other folks & is an outstanding issue...

KyleMaas commented 2 years ago

Ah, yep, I didn't see that. Thanks!

KyleMaas commented 2 years ago

So, looking through the docs for EBT/Manyverse, I'm not seeing a whole lot of changes that have been made in the Planetary fork relating to EBT. So if it works there, is there any reason some of those changes couldn't be cherry-picked and brought over here?

decentral1se commented 2 years ago

@KyleMaas The EBT changes in the Planetary fork were experimental and I did try to merge them in via https://github.com/ssbc/go-ssb/pull/184#issuecomment-1297091288 but ran out of steam with more broken tests (we have ~15 skipped tests and ~5 flaky tests in the test suite already). I still don't understand exactly how EBT is broken in go-ssb but I do intend to find out. More news as I have it. If you do any experiments, please share what you find :+1:

mycognosist commented 2 years ago

I am also planning on deepening my understanding of EBT. Hopefully we can figure this out together and get it replicating reliably.

KyleMaas commented 2 years ago

@decentral1se

Is there an issue filed listing the skipped and flaky tests so they could be debugged? Might be something I could work on if I knew what needed to be done.

decentral1se commented 2 years ago

@KyleMaas thanks, the skipped tests are being tracked on https://github.com/ssbc/go-ssb/issues/169 and the flaky tests have still to be identified. I've been avoiding cataloguing so far but if you want to take a run at this, it'd be great. Any failure listed on https://github.com/ssbc/go-ssb/actions in the recent past is most likely a flaky test. I'll open up an issue for this now.

stevenroose commented 1 year ago

When I attach Manyverse to go-ssb as a pub, I get the following error in Manyverse's log:

peer @[[pub ID]].ed25519 does not support RPC ebt.replicate

@KyleMaas how do you access Manyverse's debug log?

KyleMaas commented 1 year ago

It's been quite a while, so I don't remember for sure, but I believe that was showing up on the console when I ran the Linux Manyverse client from a terminal.

staltz commented 1 year ago

@stevenroose I got your email:

I'm trying to debug/fix the interaction between Manyverse and the go-ssb based pub server. The go-ssb maintainer has mentioned that the issue might be on their side, but it's really hard to know what's going on on the Manyverse side without debug logging.

Could you maybe take a quick look at the issues I created related to this or just here confirm that Manyverse is expected to work with pub servers? (I have tried to use go-ssb and the ssb-server JS implementation and both without success.)

I'm a Go dev but not really a JS dev anymore. But who knows. If I can pinpoint what's going wrong, I might be able to fix something. Or at least find out what's going on.

There are 2 issues going on:

Manyverse replicating with go-ssb pub

There are 2 replication mechanisms in SSB: createHistoryStream RPC calls, and EBT (Epidemic Broadcast Trees). In JS, the latter is implemented by https://github.com/ssbc/ssb-replicate/blob/master/legacy.js and the former is implemented by https://github.com/ssbc/ssb-ebt . EBT is the modern method, and is vastly better for network performance, see this talk by Dominic Tarr some 5 years ago: https://www.youtube.com/watch?v=GN57bs1eAck Also, ssb-ebt was implemented some 5 years ago as well.

Some months (years?) ago we dropped support for createHistoryStream replication in Manyverse, the primary reason being that createHistoryStream was truly horrible for performance and user experience, and that most apps had EBT already enabled, such as Patchwork and Oasis. So using only EBT in Manyverse worked well since it could replicate with Patchwork and Oasis, and there aren't a lot of other implementations of SSB used in production.

go-ssb began receiving support for EBT during the ssb-ngi-pointer project (2020–2021), but there are still a few rough edges in go-ssb's EBT that need to be fixed before it's good for production. I believe those are the issues you're encountering.

To enable logs in Manyverse to debug EBT, do the following:

  1. git clone manyverse
  2. nvm use 14
  3. npm install
  4. Modify src/backend/ssb.ts to add a config field ebt: {logging: true}, above the line replicationScheduler: {
  5. npm run build-desktop
  6. npm run desktop

Manyverse replicating with ssb-server (JS pub)

That would be https://gitlab.com/staltz/manyverse/-/issues/1824 and I suspect it's because ssb-server is running an outdated version of ssb-ebt (a previous version of ssb-ebt has notoriously been sometimes "stuck" not replicating feeds it should).

KyleMaas commented 1 year ago

@staltz

Thanks for the clarification!

decentral1se commented 1 year ago

@staltz tysm :relieved:

@stevenroose If you do end up diving in and running into various issues, I'd point you at https://github.com/ssbc/go-ssb/issues/237 which is my current focus - trying to understand why various tests are failing and how to fix them. Some of the broken-ness here might overlap, so if you have any insights, please do share.

decentral1se commented 1 year ago

Notes from P2P Basel & chatting with @boreq about how to fix stuff:

boreq commented 1 year ago

Caching logic seems to have an issue which can cause data corruption

Yeah, I saw situations where something was wrongly cached and then the logic which loads cached data and determines what to ask for would always decide that we don't need to ask for anything.

New peers are somehow not always included in the EBT logic and a work-around was to disconnect everyone and re-connect to make it work

Basically there is no code that sends new notes when the social graph changes. The notes are sent only only once when the EBT session is being created. They are not updated afterwards if e.g. we follow a new feed and we want to replicate it.

The updating of the EBT matrix seems to happen in https://github.com/decentral1se/ssb/blob/master/multilogs/combined.go which is connected with when indexing happens. The update only happens when indexing is triggered? That might be an issue.

The combined index seems to only update the ebt state (or whatever that was called) when we actually get a message from a particular feed. This means that if that combined index is used for determining which feeds to replicate in ebt logic then we only pull in feeds that we received messages from. I think the edge case was that when starting with an empty repo we would never replicate any feeds using EBTs?

This seems wrong to me as I understand that we want to replicate based on social graph? That is why I completely dropped using that combined index for EBT and use the social graph instead when trying to fix EBTs in go-ssb.

KyleMaas commented 1 year ago

Great to see some progress on this!

@boreq As I've discovered with #274, the social graph system is really, really broken. I'm working on trying to find the core issue of that. So far it seems it's not due to a race condition but seems to be stemming from problems with indexing within the builder. Sounds like that may also be helpful for your EBT work.

decentral1se commented 1 year ago

Related https://github.com/ssbc/go-ssb/pull/72

Also %Okyc+tVgyep+1ccI8nUZbBpYiXUvUBgQPOpfnZFRXQQ=.sha256 is a new EBT spec writing effort from @gpicron and friends!