rodneymo / rig-monitor

monitoring solution for mining rigs
https://randomcryptostuff.blogspot.nl/2017/08/monitoring-ethereum-mining-farm-using.html
GNU General Public License v3.0
30 stars 12 forks source link

ALERTING - How should we deal with no-connects? #37

Open wishbone1138 opened 7 years ago

wishbone1138 commented 7 years ago

I propose we write a -1 when we cannot connect for our monitoring tools. It might help differentiate between a rig running at 0 and a rig that cannot be reached for some reason? Maybe test this idea out just on minersystem[module] for total_hr? other ideas?

rodneymo commented 7 years ago

Agreed. Let's keep this open until all miners are updated

wishbone1138 commented 7 years ago

Implemented in miner-ewbf and miner-claymore. Right now I'm only writing out -1 to active_gpus, total_hr, and total_hr_dcoin (if applies). In addition, I'm only writing minersystem[module] and not minergpu[module]. It wouldn't be hard to write out the gpu as well, I'm just not sure it's necessary?

rodneymo commented 7 years ago

Cool. I'll add this to the sgminer os soon as I finish the profitability script

rodneymo commented 7 years ago

Supported added to sgminer

rodneymo commented 7 years ago

I think we can close this one, right?

wishbone1138 commented 7 years ago

I think I want to write out the GPU metrics too. The reason I say that is our gpu temp/fan speeds will stick at their current values unless those are also written in some way. It might be nice to see those go to -1 as well to indicate that we're no longer getting a measurement? I could write this out based on the "isntalled_gpu" parameter and just write out that many gpu metrics with -1 for temp, hr, and fan speed. thoughts?

wishbone1138 commented 7 years ago

image

This is an example of what I mean. You'll see -1 for kinko2 but the temp is obviously "stuck".

wishbone1138 commented 7 years ago

TODO: add a write out for miner_gpu for number of gpus in conf file. Will write -1 for temp and fan speed.