setup test network - Githubissues

ralphtheninja commented 9 years ago

@dominictarr said something the other day in this comment.

What if the nodes in that test network measured themselves (plugin?) and published that information to their own feeds?

That way we could aggregate that data and see how the network is doing memory wise, cpu wise (or whatever measurements we like to have) etc and we could more or less see how the network is doing before or after pull requests has been made.

This would help locating bugs and problems in all future versions of scuttlebot.

Questions that come to mind:

What is the best way to run this network?
How to deploy it?

Other thoughts?

pfrazee commented 9 years ago

+1 I think this is a superb idea. I can't wait to have a dashboard showing the aggregated metrics!

ralphtheninja commented 9 years ago

What if a process crashes. Like get an uncaught exception. Should it try to publish that exception and then restart itself? Would be very useful to get this information and preferably the callstack if possible. This could get scary, if the exception is happening while publishing ;)

pfrazee commented 9 years ago

Hah yeah, that might get ugly but it could be worth a try. Should be possible to hook into process.on('uncaughtException')

ralphtheninja commented 9 years ago

We could store caughtUncaughtException or something and if it's already set we just restart the process and don't publish.

ralphtheninja commented 9 years ago

Or wait. This should probably be an external tool after all, like a guard that is running sbot and just issuing commands to it to update the feed.

pfrazee commented 9 years ago

probably fine

ralphtheninja commented 9 years ago

So this is an initial attempt to just run sbot and use sbot to report its statistics https://github.com/ralphtheninja/sbot-test-guard

Need to get more varied data and prune some of the stuff that is being used right now.

dominictarr commented 9 years ago

@ralphtheninja +1 we do have a readable log format, that outputs errors. if there is an unexpected error it's better that we crash, your guard could certainly parse and aggregate this. if it was to post status messages, they should be rate limitable.

ralphtheninja commented 9 years ago

@dominictarr For sure. The thought crossed my mind yesterday but wanted to focus on main use case. The log should def be parsed.

ralphtheninja commented 9 years ago

@dominictarr Oh btw, know of any good log rotation modules? :)

No9 commented 9 years ago

@ralphtheninja this may be of some use/interest? http://blog.trevnorris.com/2015/02/asyncwrap-tutorial-introduction.html

ralphtheninja commented 9 years ago

@No9 Very nice stuff.

ralphtheninja commented 9 years ago

@dominictarr Do we care about stdout or is it enough to parse stderr for errors?

ralphtheninja commented 9 years ago

So here you can see a little preview.

In this setting I have configured the guard to probe the process every 10 seconds. I also added a small crash script in sbot which makes it crash after 30 seconds. So the guard will post two feeds before sbot crashes and gets restarted. Once it's up and running again, the previous error is sent to the feed, before starting to probe again.

crash-message-to-feed

function die() {
  throw new Error('wtf dude something went wrong')
}
setTimeout(function () {
  die()
}, 30000)

pfrazee commented 9 years ago

Yeah that's nice. Im going to want to

pick out just the data we need and reduce that obj size
probably reduce the report frequency. It would be nice to publish, say, once every 15 minutes by default, but sample every 30 seconds or so and publish more frequently if there are large changes in the metrics
publish using a different message type than post -- something like sys-stat for normal metrics and error-dump for the crashes

ralphtheninja commented 9 years ago

Sounds good.

I'll fix better message types and adjust default values for sampling. Currently this will be the same as posting for now. Added issue for separation of sample and publish rate https://github.com/ralphtheninja/sbot-test-guard/issues/5

ralphtheninja commented 9 years ago

Gah I fixed separate publish rate as well. Published as 2.0.0

pfrazee commented 9 years ago

nice! I'll dig into the specific metrics now and get this setup on my pub

ralphtheninja commented 9 years ago

Cool. I already have ideas for more modules lol, it never ends when you start this node thing :)

pfrazee commented 9 years ago

truth :)

dominictarr commented 9 years ago

test network is over here: https://github.com/ssbc/ssb-testnet

ssbc / ssb-server

setup test network #111