prg3 / cgminer_exporter

Prometheus exporter for Cgminer (specifically Antminers)
BSD 3-Clause "New" or "Revised" License
9 stars 10 forks source link

Exporter response deadly slow #1

Open T28PJ opened 6 years ago

T28PJ commented 6 years ago

Hi, we use cgminer_exporter to scrape 240 L3/D3 Antminer. First we had Prometheus set to 10s scrape interval. We noticed that no responses were coming in and accessing the exporter manually had a delay of like 5 minutes. A interval of 120s is working for us right now, eventhough there are sporadic fails. Logs look like this:

[E 180719 11:19:02 web:2064] 500 GET /metrics?target=10.1.4.65 (::1) 1002.38ms
[I 180719 11:19:02 web:2064] 200 GET /metrics?target=10.1.4.58 (::1) 17.18ms

Any ideas what to do about this?

cgminer_exporter is running as docker-compose

  cgminer-exporter:
    image: majestik/cgminer_exporter
    container_name: cgminer-exporter
    restart: always
    ports:
      - "9188:9188"
    network_mode: "host"

This is our prometheus.yml

# cgminer/bmminer (Antminer) Exporter      
  - job_name: 'cgminer-exporter'
    scrape_interval: 120s
    metrics_path: /metrics
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9154
    file_sd_configs:
      - files:
          - nodes.yml 
prg3 commented 6 years ago

How many miners are you monitoring with it?

T28PJ commented 6 years ago

240

prg3 commented 6 years ago

Sorry, I just reread the message. Let me look into that, I was only using it to monitor 10 so there could be some thread count thing I can do. Worst case you could always run multiple copies of the container and partition the miners, Prometheus won't care if they come from different exporters.

T28PJ commented 6 years ago

@prg3 - Did you had time to take a look at this matter?

prg3 commented 6 years ago

Just added a setting for a THREADS environment variable that you can use to tweak to see if that helps.

T28PJ commented 6 years ago

Thank you. I didn't had success though. Still errors while scraping the exporter.

This is my docker-compose.yml

  cgminer-exporter:
    image: majestik/cgminer_exporter
    container_name: cgminer-exporter
    environment:
      - THREADS=15
    restart: always
    ports:
      - "9154:9154"
    network_mode: "host"

I also reduced the Prometheus targets to only 8 Antminer. No change. This is what I get, even with a high scrape interval (300s):

$ sudo docker logs cgminer-exporter
[I 180731 09:19:16 process:128] Starting 15 processes
[E 180731 09:19:20 web:1670] Uncaught exception GET /metrics?target=10.1.1.17 (::1)
    HTTPServerRequest(protocol='http', host='localhost:9154', method='GET', uri='/metrics?target=10.1.1.17', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/site-packages/tornado/web.py", line 1590, in _execute
        result = method(*self.path_args, **self.path_kwargs)
      File "./cgminer_exporter.py", line 73, in get
        metricdata = getfromIP(target)
      File "./cgminer_exporter.py", line 54, in getfromIP
        s.connect((ip,int(4028)))
      File "/usr/local/lib/python2.7/socket.py", line 228, in meth
        return getattr(self._sock,name)(*args)
    error: [Errno 111] Connection refused

........

Any additional ideas?

prg3 commented 6 years ago

Are they getting a connection refused every time, or is this just a random occurrence? I'm thinking this could be the miners themselves not liking the frequency of checks, although I was running mine at 2s interval with no problems.

If you run http://container:port/metrics?target=10.1.4.58 with curl, can you reproduce the problem?

T28PJ commented 6 years ago

Yes, I had the same thought. Single scraping gives the same error:


$ curl http://localhost:9154/metrics?target=10.1.4.58

<html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>

I try to switch off the whole exporter and reboot the antminer. Didn't change anything.

prg3 commented 6 years ago

I am confused, do they work at all, or is this a performance problem?

Can the container ping the ant miner? Can you connect to port 4028 via nc or telnet from the container?

T28PJ commented 6 years ago

(deleted old comment) I've found a external error source which has nothing to do with cgminer-exporter. Right now it runs fine with scrape interval of 15s and THREADS=15 on all 240 antminers. Thank you, I observe if anything changes.