Closed ralphtheninja closed 9 years ago
+1 I think this is a superb idea. I can't wait to have a dashboard showing the aggregated metrics!
What if a process crashes. Like get an uncaught exception. Should it try to publish that exception and then restart itself? Would be very useful to get this information and preferably the callstack if possible. This could get scary, if the exception is happening while publishing ;)
Hah yeah, that might get ugly but it could be worth a try. Should be possible to hook into process.on('uncaughtException')
We could store caughtUncaughtException
or something and if it's already set we just restart the process and don't publish.
Or wait. This should probably be an external tool after all, like a guard that is running sbot
and just issuing commands to it to update the feed.
probably fine
So this is an initial attempt to just run sbot and use sbot to report its statistics https://github.com/ralphtheninja/sbot-test-guard
Need to get more varied data and prune some of the stuff that is being used right now.
@ralphtheninja +1 we do have a readable log format, that outputs errors. if there is an unexpected error it's better that we crash, your guard could certainly parse and aggregate this. if it was to post status messages, they should be rate limitable.
@dominictarr For sure. The thought crossed my mind yesterday but wanted to focus on main use case. The log should def be parsed.
@dominictarr Oh btw, know of any good log rotation modules? :)
@ralphtheninja this may be of some use/interest? http://blog.trevnorris.com/2015/02/asyncwrap-tutorial-introduction.html
@No9 Very nice stuff.
@dominictarr Do we care about stdout
or is it enough to parse stderr
for errors?
So here you can see a little preview.
In this setting I have configured the guard to probe the process every 10 seconds. I also added a small crash script in sbot
which makes it crash after 30 seconds. So the guard will post two feeds before sbot
crashes and gets restarted. Once it's up and running again, the previous error is sent to the feed, before starting to probe again.
function die() {
throw new Error('wtf dude something went wrong')
}
setTimeout(function () {
die()
}, 30000)
Yeah that's nice. Im going to want to
post
-- something like sys-stat
for normal metrics and error-dump
for the crashesSounds good.
I'll fix better message types and adjust default values for sampling. Currently this will be the same as posting for now. Added issue for separation of sample and publish rate https://github.com/ralphtheninja/sbot-test-guard/issues/5
Gah I fixed separate publish rate as well. Published as 2.0.0
nice! I'll dig into the specific metrics now and get this setup on my pub
Cool. I already have ideas for more modules lol, it never ends when you start this node thing :)
truth :)
test network is over here: https://github.com/ssbc/ssb-testnet
@dominictarr said something the other day in
this
comment.What if the nodes in that test network measured themselves (plugin?) and published that information to their own feeds?
That way we could aggregate that data and see how the network is doing memory wise, cpu wise (or whatever measurements we like to have) etc and we could more or less see how the network is doing before or after pull requests has been made.
This would help locating bugs and problems in all future versions of scuttlebot.
Questions that come to mind:
Other thoughts?