staltz / easy-ssb-pub

An easy-to-host server that runs an SSB "Pub"
GNU Affero General Public License v3.0
209 stars 35 forks source link

Instability, restarts and errors in logs #34

Open Simon-L opened 6 years ago

Simon-L commented 6 years ago

I've tried to track this and document as best as possible.

As of now, cloning this repo and running the docker-compose commands on a fresh DigitalOcean server results in unwanted behaviour.

This is the log from the first start, the error shown here appears a few minutes after it has started.

npm info it worked if it ends with ok
npm info using npm@5.6.0
npm info using node@v8.10.0
npm info lifecycle easy-ssb-pub@4.0.0~prestart: easy-ssb-pub@4.0.0
npm info lifecycle easy-ssb-pub@4.0.0~start: easy-ssb-pub@4.0.0

easy-ssb-pub@4.0.0 start /usr/src/app
tsc node dist/index --host $HOST

server error, from net:::ffff:198.1...414~shs:
Error: expected hello
    at abort (/usr/src/app/node_modules/secret-handshake/protocol.js:80:45)
    at Object.cb (/usr/src/app/node_modules/secret-handshake/protocol.js:90:26)
    at drain (/usr/src/app/node_modules/pull-reader/index.js:42:23)
    at /usr/src/app/node_modules/pull-reader/index.js:59:18
    at /usr/src/app/node_modules/pull-reader/index.js:20:7
    at /usr/src/app/node_modules/pull-stream/throughs/through.js:19:9
    at drain (/usr/src/app/node_modules/stream-to-pull-stream/index.js:141:18)
    at Socket.canonymous\ (/usr/src/app/node_modules/stream-to-pull-stream/index.js:158:5)
    at emitNone (events.js:111:20)
    at Socket.emit (events.js:208:7)

Various other errors are reported:

Error replicating with @o2ddy...=.ed25519:
   Error: parent stream is closing
    at PacketStream.stream (/usr/src/app/node_modules/packet-stream/index.js:46:28)
    at Object.ws.remoteCall (/usr/src/app/node_modules/muxrpc/stream.js:114:17)
    at /usr/src/app/node_modules/muxrpc/index.js:45:15
    at remoteCall (/usr/src/app/node_modules/muxrpc/api.js:26:19)
    at EventEmitter.obj.(anonymous function) [as createHistoryStream] (/usr/src/app/node_modules/muxrpc/api.js:40:20)
    at replicate (/usr/src/app/node_modules/scuttlebot/plugins/replicate/legacy.js:247:13)
    at EventEmitter.fallback (/usr/src/app/node_modules/scuttlebot/plugins/replicate/legacy.js:307:7)
    at Object.onceWrapper (events.js:315:30)
    at emitOne (events.js:116:13)
    at EventEmitter.emit (events.js:211:7)

Or (with timestamps):

2018-06-25T00:56:08.922909057Z Error replicating with @uMff...=.ed25519:
2018-06-25T00:56:08.923025645Z    Error: parent stream is closing
2018-06-25T00:56:08.933272975Z     at PacketStream.stream (/usr/src/app/node_modules/packet-stream/index.js:46:28)
2018-06-25T00:56:08.933313523Z     at Object.ws.remoteCall (/usr/src/app/node_modules/muxrpc/stream.js:114:17)
2018-06-25T00:56:08.933322540Z     at /usr/src/app/node_modules/muxrpc/index.js:45:15
2018-06-25T00:56:08.933328284Z     at remoteCall (/usr/src/app/node_modules/muxrpc/api.js:26:19)
2018-06-25T00:56:08.933334166Z     at EventEmitter.obj.(anonymous function) [as createHistoryStream] (/usr/src/app/node_modules/muxrpc/api.js:40:20)
2018-06-25T00:56:08.933339838Z     at replicate (/usr/src/app/node_modules/scuttlebot/plugins/replicate/legacy.js:247:13)
2018-06-25T00:56:08.933345290Z     at EventEmitter.fallback (/usr/src/app/node_modules/scuttlebot/plugins/replicate/legacy.js:307:7)
2018-06-25T00:56:08.933351243Z     at Object.onceWrapper (events.js:315:30)
2018-06-25T00:56:08.933356772Z     at emitOne (events.js:116:13)
2018-06-25T00:56:08.933361978Z     at EventEmitter.emit (events.js:211:7)

These logs were obtained using docker logs, the first time I ran it after running the docker-compose command for the first time the command stopped on these lines:

Killed
npm info lifecycle easy-ssb-pub@4.0.0~start: Failed to exec start script
npm ERR! code ELIFECYCLE
npm ERR! errno 137
npm ERR! easy-ssb-pub@4.0.0 start: `tsc  node dist/index --host $HOST`
npm ERR! Exit status 137
npm ERR! 
npm ERR! Failed at the easy-ssb-pub@4.0.0 start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2018-06-24T23_46_17_506Z-debug.log
npm info it worked if it ends with ok
npm info using npm@5.6.0
npm info using node@v8.10.0
npm info lifecycle easy-ssb-pub@4.0.0~prestart: easy-ssb-pub@4.0.0
npm info lifecycle easy-ssb-pub@4.0.0~start: easy-ssb-pub@4.0.0

Looks like it's killed either by the docker daemon or by npm inside the container, it will then just reboot.

I've made sure to run these tests on a fresh install. I have personnally run the same docker image on my server and, added to those reported here, had several other issues probably related to #27.

I have taken note of the maintenance and status warning in the README, for the time being I think this repo works for many people and figured I should share my investigation!

Simon-L commented 6 years ago

I have made some more experiments with this package.

Indeed it seems the software is crashing and the container restarts repeatedly, sometimes after running for 3 minutes, sometimes after 20 minutes.

I am also noticing it doesn't always crash after the errors reported in the previous message, it might happen seemingly randomly. The first line is always:

Killed

which makes me think it is hanging at some point and also puts me in an uncomfortable situation where I have absolutely no idea where to look in the code!

I'm using docker-compose but I can confirm these instability issues also happen on bare metal Debian.

deSagaz commented 6 years ago

I'm experiencing similar issues at ssb.learningsocieties.org, which is also running at DigitalOcean on a standalone droplet.

Every once in a while I check whether the server is still up in practice, with varying results. The statistics paint a good picture:

screenshot_2018-06-30 digitalocean - ssb

Sometimes the server restarts and resumes correct behaviour, but sometimes the bandwidth drops to 0 and CPU/memory usage spike for extended periods of time. The logged errors are similar.

I don't really have time to troubleshoot it properly, sadly... For the time being I'll attempt to keep the pub up and running as best I can.

deSagaz commented 6 years ago

I've moved over to dinosaur's image last night with a separate webserver docker image. I also upgraded to 2GB RAM, just to be sure. The statistics look a lot healthier.

screenshot_2018-07-01 digitalocean - ssb-pub

Not sure why that is exactly; perhaps the healer container makes the difference?

For now I've generated an invite that allows 1000 additional people to join, which should practically do the same as the previous auto-generation. Plus, once the server gets that size it'd be time to lock down anyhow...

staltz commented 6 years ago

I've had problems with dinosaur's image as well as easy-ssb-pub's image, but I noticed it's probably my 3-hop social graph is huge for the server. Configuring hops as 1 helped (with dinosaur's image), the server is pretty stable now. I could try hops=2 as well.

Just 2 cents if anyone is looking to improve stability.