pwm / rig-stats

Nvidia GPU and miner statistics exporter for prometheus.io
MIT License
23 stars 3 forks source link

Cannot pull nVidia-smi data on HiveOS #1

Closed rohanpandula closed 3 years ago

rohanpandula commented 6 years ago

Hi!

Thank you so much for developing this. I have the stats exporter talking to Prometheus correctly, I know this because when I go to the Prometheus port and IP, I get the data as noted by the screenshot.

screenshot at mar 03 17-37-29

But when I import your dashboard into the grafana, none of the graphs are populating.

screenshot at mar 03 17-38-33

I am very new grafana/Prometheus so if I am missing something obvious, I apologize.

Thanks again!

pwm commented 6 years ago

Hi,

Having the 1st screenshot is a good sign, it means that the script is up and running and presenting data. The next step is to configure Prometheus to pull this data. This can be done in the prometheus config /etc/prometheus/prometheus.yml by adding the following under scrape_configs::

  - job_name: rig
    metrics_path: /
    target_groups:
      - targets: ['localhost:9001']

This tells Prometheus to pull the data from the script (your 1st screenshot). Make sure the host and port are correct, the script runs on port 9001 by default.

Once this is done then you can navigate to Prometheus' own webpage, which by default runs on port 9090. If you start typing nvidia in the box there it should give you all the options from the script, eg. nvidia_clock_speed.

Finally you need to tell Grafana to use Prometheus as a data source.This can be done in menu under datasources -> Add data source. The type is Prometheus and the url is Prometheus's url, eg. http://localhost:9090/. Once this is done then stats should start showing up.

Hope this helps.

rohanpandula commented 6 years ago

Ah! Thank you. I see where my error was, I put the rig-stats script as the Prometheus data source rather than Prometheus itself. For whatever reason, Grafana told me when using the test and add function that it was successful. Thank you for your help and work on this. It is now picking up the nvidia-smi info. The hashing info and pool info are not working now, and I am using DSTM and fly pool. I am sure with more tinkering I can figure it out but when I run

./rig_stats.py -o flypool -O api-zcash.flypool.org -u address

I get this when I go to the localhost:9001

Error response Error code: 500

Message: error generating metric output.

Error code explanation: 500 - Server got itself in trouble.

screenshot of console error

screenshot at mar 04 15-02-27
pwm commented 6 years ago

Hey, I've pushed an update, can you let me know if the above error still occurs?

rohanpandula commented 6 years ago

I did git pull, got the update and am now having this issue. It says already up to date at the top because I tried again.

screenshot at mar 05 18-36-05
pwm commented 6 years ago

@rohanpandula Sorry, did not have much time this week, but just pushed an update that hopefully fixes the issue. Could you pull the latest version (0.2.1) and test it?