Same three (dead) pubs keep getting scheduled over and over again

black-puppydog commented 2 years ago

I'm running Manyverse desktop and follow some profiles that have pubs following them, so I guess that conn is scheduling those pubs to get updates. However, it seems to not use any back-off mechanism when it fails to reach those pubs. From its startup, I see the following logs showing up every 1-2 seconds, over and over:

  ssb:conn-hub connecting to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= +2s
  ssb:conn-hub failed to connect to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= because: connect ECONNREFUSED 176.58.117.63:8008 +33ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +6ms
  ssb:conn-hub connecting to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= +2s
  ssb:conn-hub failed to connect to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= because: getaddrinfo ENOTFOUND diasp.club +13ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +5ms
  ssb:conn-hub connecting to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= +1s
  ssb:conn-hub failed to connect to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= because: getaddrinfo ENOTFOUND ssb.ekata.social +470ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +6ms
  ssb:conn-hub connecting to net:188.166.252.233:8008~shs:uRECWB4KIeKoNMis2UYWyB2aQPvWmS3OePQvBj2zClg= +2s
  ssb:conn-hub failed to connect to net:188.166.252.233:8008~shs:uRECWB4KIeKoNMis2UYWyB2aQPvWmS3OePQvBj2zClg= because: connect ECONNREFUSED 188.166.252.233:8008 +264ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +5ms
  ssb:conn-hub connecting to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= +2s
  ssb:conn-hub failed to connect to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= because: connect ECONNREFUSED 176.58.117.63:8008 +22ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +8ms
  ssb:conn-hub connecting to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= +3s
  ssb:conn-hub failed to connect to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= because: getaddrinfo ENOTFOUND diasp.club +45ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +3s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +6ms
  ssb:conn-hub connecting to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= +1s
  ssb:conn-hub failed to connect to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= because: getaddrinfo ENOTFOUND ssb.ekata.social +301ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +3ms
  ssb:conn-hub connecting to net:188.166.252.233:8008~shs:uRECWB4KIeKoNMis2UYWyB2aQPvWmS3OePQvBj2zClg= +2s
  ssb:conn-hub failed to connect to net:188.166.252.233:8008~shs:uRECWB4KIeKoNMis2UYWyB2aQPvWmS3OePQvBj2zClg= because: connect ECONNREFUSED 188.166.252.233:8008 +262ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +3s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +6ms
  ssb:conn-hub connecting to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= +2s
  ssb:conn-hub failed to connect to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= because: connect ECONNREFUSED 176.58.117.63:8008 +21ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +4ms
  ssb:conn-hub connecting to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= +2s
  ssb:conn-hub failed to connect to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= because: getaddrinfo ENOTFOUND ssb.ekata.social +508ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +6ms
  ssb:conn-hub connecting to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= +2s
  ssb:conn-hub failed to connect to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= because: getaddrinfo ENOTFOUND diasp.club +16ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +4ms
  ssb:conn-hub connecting to net:188.166.252.233:8008~shs:uRECWB4KIeKoNMis2UYWyB2aQPvWmS3OePQvBj2zClg= +1s
  ssb:conn-hub failed to connect to net:188.166.252.233:8008~shs:uRECWB4KIeKoNMis2UYWyB2aQPvWmS3OePQvBj2zClg= because: connect ECONNREFUSED 188.166.252.233:8008 +250ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +1s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +8ms
  ssb:conn-hub connecting to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= +2s
  ssb:conn-hub failed to connect to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= because: connect ECONNREFUSED 176.58.117.63:8008 +24ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +6ms
  ssb:conn-hub connecting to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= +2s
  ssb:conn-hub failed to connect to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= because: getaddrinfo ENOTFOUND ssb.ekata.social +480ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +2s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +6ms
  ssb:conn-hub connecting to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= +1s
  ssb:conn-hub failed to connect to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= because: getaddrinfo ENOTFOUND diasp.club +11ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +1s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +5ms
  ssb:conn-hub connecting to net:188.166.252.233:8008~shs:uRECWB4KIeKoNMis2UYWyB2aQPvWmS3OePQvBj2zClg= +2s
  ssb:conn-hub failed to connect to net:188.166.252.233:8008~shs:uRECWB4KIeKoNMis2UYWyB2aQPvWmS3OePQvBj2zClg= because: connect ECONNREFUSED 188.166.252.233:8008 +271ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +3s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +6ms
  ssb:conn-hub connecting to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= +1s
  ssb:conn-hub failed to connect to net:176.58.117.63:8008~shs:J+0DGLgRn8H5tVLCcRUfN7NfUcTGEZKqML3krEOJjDY= because: connect ECONNREFUSED 176.58.117.63:8008 +22ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +1s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +3ms
  ssb:conn-hub connecting to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= +3s
  ssb:conn-hub failed to connect to net:diasp.club:8008~shs:DNatJUQOFr1NZewA3oRG9i/r0XJZLbKYTyB225J9St0= because: getaddrinfo ENOTFOUND diasp.club +7ms
  ssb:conn-db Begun serializing and writing ConnDB into conn.json +3s
  ssb:conn-db Done serializing and writing ConnDB into conn.json +4ms
  ssb:conn-hub connecting to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= +2s
  ssb:conn-hub failed to connect to net:ssb.ekata.social:8008~shs:FJDMuATvh0a3qIcIPcstUANVIj362fWkYCG5bnkZ5nA= because: getaddrinfo ENOTFOUND ssb.ekata.social +502ms

As you can see, these are the same peers being scheduled, connections failing for IP or DNS reasons, yet they keep being scheduled again.

Not sure whether this is an issue with ssb-conn or with manyverse...

staltz commented 2 years ago

Thanks! It's very likely an issue with ssb-conn. It's hard for me to reproduce, but I was intending to give some time to ssb-conn this week anyway, so I'll take a look at this one too. One way you could help me is to share your conn.json (I believe there's rarely any private info in that file), that should help me figure out why this is happening to you.

black-puppydog commented 2 years ago

Here it is. I just noticed this stopped since I used an invite for my own pub. But I can trigger this easily by just going to the connections tab (which showed two active pub connections when I tried this) and then manually disconnecting (but not "forgetting") all active connections. If I then launch my phone's manyverse and the two automatically connect, it stops again.

So I'd say this is happening only in situations where there is no connection currently. Which makes it less severe I guess, but still there should be some back-off I guess, and/or a coupling to the OS signalling (un-)availability of general network connectivity. I.e. if there's no net & no bluetooth, don't try to gossip, and if there is, even if you have nobody to speak to, don't try to reconnect over and over if they're not responding.

black-puppydog commented 2 years ago

Ah, GH doesn't let me attach json files?! Okay...

Well, here is a totaly-not-json file: conn.txt

staltz commented 2 years ago

So I'd say this is happening only in situations where there is no connection currently.

Yes, that was my suspicion as well. ssb-conn has logic for "if I can't get a hold on any peer, just keep on trying like mad"

staltz commented 2 years ago

ssb-conn@6.0.0 released just now. I tested with your conn.json file (thanks for it!) and the issue is gone. We now have a retry timer system that's based on how many connection failures have happened with that peer since the last successful connection, so that it follows the formula 4sec + 10sec * Math.min(64, failure || 0) which I discovered by testing many different values and tweaking the curve.

black-puppydog commented 2 years ago

Can confirm that the retries don't happen anymore now, at least not regularly. :)

ssbc / ssb-conn

Same three (dead) pubs keep getting scheduled over and over again #30