prometheus / client_python

Prometheus instrumentation library for Python applications
Apache License 2.0
3.93k stars 794 forks source link

Metrics with same name but different labels. #671

Closed macktab closed 3 years ago

macktab commented 3 years ago

I am trying to send to pushgateway metrics with same name but different labels, but getting error: ValueError: Duplicated timeseries in CollectorRegistry

Is there anyway to do that?

csmarchbanks commented 3 years ago

Hello,

It is possible to send metrics with the same name and different label values. You will have to define the metric once then use the .labels(labelvalues) function.

Are you trying to expose two metrics with the same name, and different sets of labelnames? A brief example of how you are defining metrics would help.

macktab commented 3 years ago

test.sh:


#!/usr/bin/python
import json
import requests
from prometheus_client import CollectorRegistry, push_to_gateway, Enum

clusters_urls=[
  "es.example.com"
]
es_registry = CollectorRegistry()
for cluster_url in clusters_urls:
#stats = requests.get('http://{cluster_url}/_cluster/health?level=indices'.format(cluster_url=cluster_url)).json()
  with open("test.json") as jsonFile:
    stats = json.load(jsonFile)
    jsonFile.close()
  if (stats is not None):
    cluster_name=stats.get("cluster_name")
    es_cluster_status = Enum(
      'es_cluster_status',
      'Status of ES cluster',
      ['cluster_name', 'instance'],
      states=['green', 'yellow', 'red'],
      registry=es_registry
    )
    es_cluster_status.labels(cluster_name=cluster_name, instance=cluster_url).state(stats.get("status"))
    indices_data=stats['indices']
    for key,value in indices_data.items():
      index_status=value.get("status")
      es_index_status = Enum(
        'es_index_status',
        'Status of ES cluster',
        [
          'cluster_name',
          'instance',
          'index_name'
        ],
        states=['green', 'yellow', 'red'],
        registry=es_registry
      )
      es_index_status.labels(cluster_name=cluster_name, instance=cluster_url, index_name=key).state(value.get("status"))
  push_to_gateway('pushgateway.example.com', job='es_metrics', registry=es_registry)
macktab commented 3 years ago

test.json:


{
  "cluster_name": "test-es",
  "status": "green",
  "indices": {
    "test_index_0": {
      "status": "green"
    },
    "test_index_1": {
      "status": "red"
    }
  }
}
csmarchbanks commented 3 years ago

Hello, what you will want to do is to define your metrics outside of the loop, and then use .labels() inside of the loop. For example:

#!/usr/bin/python
import json
import requests
from prometheus_client import CollectorRegistry, push_to_gateway, Enum

clusters_urls=[
  "es.example.com"
]

es_registry = CollectorRegistry()
es_cluster_status = Enum(
  'es_cluster_status',
  'Status of ES cluster',
  ['cluster_name', 'instance'],
  states=['green', 'yellow', 'red'],
  registry=es_registry
)
es_index_status = Enum(
  'es_index_status',
  'Status of ES cluster',
  [
    'cluster_name',
    'instance',
    'index_name'
  ],
  states=['green', 'yellow', 'red'],
  registry=es_registry
)

for cluster_url in clusters_urls:
#stats = requests.get('http://{cluster_url}/_cluster/health?level=indices'.format(cluster_url=cluster_url)).json()
  with open("test.json") as jsonFile:
    stats = json.load(jsonFile)
    jsonFile.close()
  if (stats is not None):
    cluster_name=stats.get("cluster_name")
    es_cluster_status.labels(cluster_name=cluster_name, instance=cluster_url).state(stats.get("status"))
    indices_data=stats['indices']
    for key,value in indices_data.items():
      index_status=value.get("status")
      es_index_status.labels(cluster_name=cluster_name, instance=cluster_url, index_name=key).state(value.get("status"))
  push_to_gateway('pushgateway.example.com', job='es_metrics', registry=es_registry)

Let me know if that does not work, as if so there is a bug somewhere.

macktab commented 3 years ago

Oh.... :) Yes, it is working. Thank you very much, sorry for disturbing.

BarryThrill commented 3 years ago

Im currently facing an issue where I do get an error that is saying:

ValueError: Duplicated timeseries in CollectorRegistry: {'scraper_request_count_created', 'scraper_request_count_total', 'scraper_request_count'}

I have two scripts which we can call file1.py and file2.py

file1.py:

import time
import requests
from lib.prometheus import REQUEST_COUNT

def from_page(url):
   while True:
     with requests.get(url) as rep:
         REQUEST_COUNT().labels(store="stackoverflow", http_status=rep.status_code).inc()
         print("Response: ", rep.status_code)
         time.sleep(60)

if __name__ == '__main__':
   from_page("https://stackoverflow.com")

file2.py

import time
import requests
from lib.prometheus import REQUEST_COUNT

def from_page(url):
   while True:
     with requests.get(url) as rep:
         REQUEST_COUNT().labels(store="google", http_status=rep.status_code).inc()
         print("Response: ", rep.status_code)
         time.sleep(60)

if __name__ == '__main__':
   from_page("https://google.com")

As you can see they both call the lib.prometheus import REQUEST_COUNT which is:

    from prometheus_client import Counter, CollectorRegistry

    registery = CollectorRegistry()

    def REQUEST_COUNT():
        return Counter(
            namespace="scraper",
            name="request_count",
            documentation="Count the total requests",
            labelnames=['store', 'http_status'],
            registry=registery
        )

The problem is that if I run this script simultaneously then I will get the error ValueError: Duplicated timeseries in CollectorRegistry: {'scraper_request_count_created', 'scraper_request_count_total', 'scraper_request_count'} and I wonder what can I do be able to push the data even if its in duplicated timeseries?

csmarchbanks commented 3 years ago

You should not be creating a new Counter each time REQUEST_COUNT is called. Instead define:

REQUEST_COUNT = Counter(
    namespace="scraper",
    name="request_count",
    documentation="Count the total requests",
    labelnames=['store', 'http_status'],
    registry=registery
)

then in each of your files you will use REQUEST_COUNT.labels(...) instead of REQUEST_COUNT().labels(...).

BarryThrill commented 3 years ago

You should not be creating a new Counter each time REQUEST_COUNT is called. Instead define:

REQUEST_COUNT = Counter(
    namespace="scraper",
    name="request_count",
    documentation="Count the total requests",
    labelnames=['store', 'http_status'],
    registry=registery
)

then in each of your files you will use REQUEST_COUNT.labels(...) instead of REQUEST_COUNT().labels(...).

image

Much appreciated for the fast reply @csmarchbanks <3 <3 <3