Open christianbundy opened 4 years ago
That sounds unusual. You can try putting the env var DEBUG='*'
in front of your terminal command, or inside node.js scripts have process.env.DEBUG = '*'
. This should enable a lot more detailed debugging logs from CONN.
It's so intermittent that I've had trouble debugging it with DEBUG='*'
, but it just happened again. Hmm.
It looks like SSB-CONN was connecting to a few peers, but after an hour without any new content I tried an ssb gossip reconnect
and the new messages came in without any problems. Maybe it's a problem of expectations: how often does SSB-CONN do the equivalent of ssb gossip reconnect
?
@christianbundy Do you have in your ssb config {conn: {autostart: false}}
? That would disable the scheduler. If it's undefined, it defaults to true
.
In ssb-conn, ssb gossip reconnect
does a simple ssb.conn.hub().reset()
. In the scheduler in ssb-conn, it also does a reset when the network interfaces change or when the computer wakes up from sleep mode. This is copy pasted from ssb-gossip:
// Upon wakeup, trigger hard reconnect
onWakeup(() => this.ssb.conn.hub().reset());
// Upon network changes, trigger hard reconnect
onNetwork(() => this.ssb.conn.hub().reset());
Is it so that you were connected with peers, and replication (ssb-replicate) was not happening with them, but a ssb gossip reconnect
made you replicate with them? (I'm trying to figure out if this bug could be a problem in ssb-replicate, not ssb-conn). That said, I think hub.reset()
actually closes connections with all current peers.
Conclusion: so far I'm still confused by this bug.
Nope, I've got the default ssb-config.
Is it so that you were connected with peers, and replication (ssb-replicate) was not happening with them, but a
ssb gossip reconnect
made you replicate with them?
I was connected to peers, but it didn't seem to be connecting to new peers. My intuition is that the replication is working fine, but that it's not connecting to new peers as often as I'd expect. I was able to use conn.hub().reset()
to manually force SSB-CONN to talk to different peers, but it didn't seem to be doing that by default. How often should I expect the connection scheduler to switch to different peers?
Is this behavior noticeably different than what you had before with ssb-gossip? The ssb-conn scheduler mostly follows the legacy scheduler in ssb-gossip.
Every 30min, connections to pubs are shut down (and this frees up space for new connections).
In general I recommend getting acquainted with the scheduler source code file, it has good comments.
Is this behavior noticeably different than what you had before with ssb-gossip? The ssb-conn scheduler mostly follows the legacy scheduler in ssb-gossip.
I'm not sure. Lately I've been experimenting with running Oasis as a daemon instead of starting it when I want to use it, so maybe that's the difference? In the past I'd open Patchwork or Oasis and it'd immediately connect to a bunch of peers, but now I'm opening Oasis after it's been running for 48 hours so it's at a random point in the replication schedule. Maybe I've just been getting unlucky?
Finally had this happen when I had the debug mode on, so I've got detailed logs. At a first pass I think it happened like this:
Are there any important privacy implications I should be aware of before posting my DEBUG='ssb-conn*'
logs? I've skimmed and don't see any personal info, but I've been meaning to ask you regardless.
That sequence of events sounds reasonable. Those two peers that were connected but didn't give you new messages: are you sure there were new messages that those peers could give you? In my experience some peers have a slice of the social graph that other peers don't have.
Are there any important privacy implications I should be aware of before posting my
DEBUG='ssb-conn*'
logs? I've skimmed and don't see any personal info, but I've been meaning to ask you regardless.
It might show IP addresses and SSB IDs, but usually only IP addresses for pubs. So if you think your logs look fine, I think regarding privacy it should be okay, but on the other hand I'm not 100% sure what your logs show.
Those two peers that were connected but didn't give you new messages: are you sure there were new messages that those peers could give you?
No, I'm not sure which messages they have, and I don't think there's a technical bug in the scheduler. My point is that ten minutes of peering with people who don't have any messages for you seems sub-optimal, and since I bump into this often I'd love to brainstorm why it happens and how to fix it.
conn.json
entries and lots of them are inactive?conn.json
has lots of peers that don't follow who I follow?Possible solutions off the top of my head:
Maybe you'll have some insight here? I'll attach my logs below in case you find them useful:
(@dominictarr I'm @ mentioning you in this comment concerning the gossip scheduler and EBT replication. see below)
@christianbundy I recommend tweaking the parameters in this function below until you find numbers that satisfy you, then make a PR and we can discuss about whether those parameters would be a good default for all users:
The docs mention that the scheduler connects to "5 staged peers we follow", but it's very common for me to only be connected to 2 peers.
The important detail here is the word staged. It basically means that all (but only up to 5) peers that I follow which are currently in ssb-conn-staging (such as LAN peers or Bluetooth peers, but pubs can also be staged), those we will connect to automatically. I think in your case you had no staged peers, so it didn't connect to 5 of them, just 2, because it followed other logic. See that function source code, it shows it.
That said, I also notice that on my side (Patchwork and often Manyverse) I also have just 2 peers connected. Sometimes 3. I think we should aim for most of the time 3 peers, and if there are 2 peers, then actively try to connect to another 3rd one. I think this can be done by just tweaking the numbers in that function. I remember talking with @dominictarr that if EBT is enabled, then peers connections should not be short-lived, so I'm trying to not have too much churn in the scheduler. I could be wrong though.
just 2 peers
Oh, I think that usually it's 1 room plus 1 pub. It should be at least 2 pubs, because a room doesn't count as a replication peer.
Hi! Not urgent, but this morning I started my computer and checked SSB for messages but I didn't download any new ones. I can confirm that I was connected to a handful of pubs that should have messages for me, but no new messages were being downloaded. Running
ssb conn stop; ssb conn start
seemed to resolve the issue. Anything I can do to help debug the next time this happens?