oxen-io / lokinet

Lokinet is an anonymous, decentralized and IP based overlay network for the internet.
https://lokinet.org/
GNU General Public License v3.0
1.79k stars 224 forks source link

Lokinet router failing to communicate after service node re-registration #1995

Closed jagerman closed 1 year ago

jagerman commented 2 years ago

A couple of times now I've had a service node get decommissioned due to Lokinet unreachability after an unlock-and-reregister. Connectivitity seems fine after a restart of lokinet.

Timeline:

Lokinet-router is running fine as a registered node
Registration expires
SN gets re-registered (without any restarts)

Up to the re-registration we have, every 30 seconds, the expected whining:

Sep 26 19:46:07 skoll lokinet-router-27[1049811]: [ERR] [](53) 2022-09-26 22:46:07.424 GMT [+621h18m16.693s]
    ../llarp/router/router.cpp:1027        We are running as a service node but we seem to be deregistered

which is fine, we aren't a registered service node. These stop after the re-registration, there are no log statements at all for the next 5.5 hours, then we got deregistered because of failing lokinet connectivity. Other nodes at this time were reporting that Lokinet is unreachable; there are no log statements (logging is at warning level) for 5.5 hours, then a decomm, a recomm, and then a dereg.

Lokinet appears to track the decomm/recomm states fine, but isn't reachable by anyone else on the network until it gets restarted.

jagerman commented 2 years ago

On further investigation it appears that this node has an outdated bootstrap file. So the connectivity issue makes sense, but there really needs to be a serious warning when the nodedb is close to empty and we are active.

ianmacd commented 2 years ago

Lokinet appears to track the decomm/recomm states fine, but isn't reachable by anyone else on the network until it gets restarted.

Yes, I narrowly escaped a deregistration once, due to this bug. I also know a couple of other ops who weren't so lucky.

Now, when I bring up a shared node and I don't know when exactly it will finally go live on the network, I insulate myself from this bug with the following job in /etc/crontab.

*/15 * * * * root systemctl restart lokinet-router

jagerman commented 2 years ago

TODO:

majestrate commented 2 years ago

did this get fixed?