rodneymo / rig-monitor

monitoring solution for mining rigs
https://randomcryptostuff.blogspot.nl/2017/08/monitoring-ethereum-mining-farm-using.html
GNU General Public License v3.0
30 stars 12 forks source link

pool-monitor.sh does not grab share stats for nanopool #26

Closed wishbone1138 closed 6 years ago

wishbone1138 commented 7 years ago

Just a reminder that the code needs to be updated to grab shares as well. I'm only on nanopool at the moment so I didn't look at the code for the others.

rodneymo commented 6 years ago

Should I drop the generainfo api and replace it with the https://api.nanopool.org/v1/sia/hashratechart/:address? this one includes the hashrate and shares.

So you'll have reported hashrate shares payments

I can also add the pool total hr (https://api.nanopool.org/v1/sia/pool/hashrate) too

one other thing, should we separate the pool stats and payments in 2 different series? We could keep them in the same but it would get messy because of the different timestamps

wishbone1138 commented 6 years ago

That sounds like a good plan. I think the payments is extremely useful. I use it in a table to track when payments are issues and confirmed on the main page.

Interesting question about pool total hr. I think for larger pools that's probably meaningless for most of us. For smaller pools it might be useful to keep an eye on that if people start to jump ship. I think two more generic coin stats are probably more interesting on the pool: difficulity and blocktime. Those can map directly to profits to help someone understand why their earning have gone down.

rodneymo commented 6 years ago

BTW, I noticed that both the shares and payments can output very large arrays so it;s probably a good idea to use the bookkeeping mechanism to filter out already inserted data. I have added it to the pool-ethermine.sh, but not to the pool-moos.sh as the API generates less than 20 records (and the amount for today's date keeps updating. That's because the API does't report payments but credits). Do you want me to add it to pool-nanopool.sh or will you do it?

wishbone1138 commented 6 years ago

yes, I was looking at this. (I even added a comment about it in my code)

My code does not grab the shares table. For nanopool I just grab the "rating" which is the current number of shares that worker has provided. Since we're already pulling down general info that includes that number, we're already creating our own series over time that captures share growth. I don't know how the other pools report this, but just a note on nanopool.

There are a couple of options I see right away for the larger table returns with duplicate data.

  1. What I'm doing right now is just sending the entire table to influxdb. If any of the entries already exist (I'm using the time from the API) then they just get ignored. I don't know how large this table would have to be to create an issue in the curl command/bash shell?
  2. We can make a call to our db asking for the LAST time entry and filter our list to only include items greater than that time. This is what I planned on getting to, but haven't written the code for it. My concern here is how large can a shell variable get before we start having tool issues with awk/sed/grep etc. Theoretically vars have no set max size? I know some tools can barf if they get too big? This solution alleviates the burden on the influxdb API.
  3. I believe you're suggesting the bookkeeping mechanism you already use that uses flat csv/etc files and filter commands? This would probably be the safest if we have a concern about the size of shell variables getting too large.

The biggest question is at what size do variables become an issue for our tools and which tools are of a concern. If it's a non-issue then letting the DB take care of the duplicates seems like the easiest? Influx recommends writing in large batches (of 5000 to 10000) so I know it's API is setup for large sized inputs.

rodneymo commented 6 years ago

My code does not grab the shares table. For nanopool I just grab the "rating" which is the current number of shares that worker has provided. Since we're already pulling down general info that includes that number, we're already creating our own series over time that captures share growth. I don't know how the other pools report this, but just a note on nano pool.

ok

What I'm doing right now is just sending the entire table to influxdb. If any of the entries already exist (I'm using the time from the API) then they just get ignored. I don't know how large this table would have to be to create an issue in the curl command/bash shell?

I don't know either, but yesterday I was playing with the nano pool chart API and it generated, I believe, over 1000 data points. I read that influxDB uses measurement+tags+timestamp as key. So if you insert a new datapoint with the the same key it will overwrite the values. I tested this and indeed that's the behavior.

We can make a call to our db asking for the LAST time entry and filter our list to only include items greater than that time. This is what I planned on getting to, but haven't written the code for it. My concern here is how large can a shell variable get before we start having tool issues with awk/sed/grep etc. Theoretically vars have no set max size? I know some tools can barf if they get too big? This solution alleviates the burden on the influxdb API. I believe you're suggesting the bookkeeping mechanism you already use that uses flat csv/etc files and filter commands? This would probably be the safest if we have a concern about the size of shell variables getting too large.

Actually I think reading the LAST time entry is a more elegant solution. The bookkeeping mechanism is a bit messy in my pov.

One other thing, I noticed in your pool-nanopool.sh that you were creating a measurement for each API/measurement. We could have a single measurement per pool type and then use a different tag per API call (and data) so that even if the timestamp overlaps influx will still treat is as a different record. This way all the pool data will be contained in a single time series. The main reason is to avoid future performance issues related to having too many time series.

Here's the schema I suggest: MEASUREMENT: pool_ TAGS ( I suggest to use the list for the POOL_LIST array) plus a report tag (e.g. stats, payments, shares, etc... This will be used to distinguish data from different APIs for the same pool: pool_type,crypto,label,,api_token,wallet_addr,report FIELDS TIMESTAMP

Thoughts?

rodneymo commented 6 years ago

And here's how to get the timestamp:

curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=rigdata" --data-urlencode "epoch=ns" --data-urlencode "q=SELECT last(reportedHashrate) from pool_stats" | jq -r '.results[0].series[0].values[0][0]'

wishbone1138 commented 6 years ago

I don't know either, but yesterday I was playing with the nano pool chart API and it generated, I believe, over 1000 data points. I read that influxDB uses measurement+tags+timestamp as key. So if you insert a new datapoint with the the same key it will overwrite the values. I tested this and indeed that's the behavior.

Gotcha. It doesn't ignore it, it overwrites it. Which is still fine I think, but again not the best solution.

Actually I think reading the LAST time entry is a more elegant solution. The bookkeeping mechanism is a bit messy in my pov.

Agreed

[ADDITION RESPONSES MOVED TO NEW SECTION FOR MEASUREMENT vs TAG DISCUSSION]

wishbone1138 commented 6 years ago

https://api.nanopool.org/v1/eth/user/ has everything you need for workers, so I'm going to add that so it includes current shares and close this one out after I push to you and you get a chance to take a look at it.

wishbone1138 commented 6 years ago

fixed, closing.