Concerns about metric labels cardinality

biozz commented 1 year ago

Hi all.

I am running ~100 rq workers and they all generate UUID names, which results in /metrics pollution: each worker-specific metric times 100 -> ~400 metrics. I also deploy my application quite frequently, let's say once a day on weekdays. This results in ~2000 label values per week, ~8000 per month and so on. This is a huge cardinality which translates into high resource usage of the Prometheus infrastructure.

I can't use specific names because they have to be unique. I can't use some kind of sequential worker ID because I have a graceful deployment and those sequences are going to overlap. I am left with some kind of scraper configuration, using regular expressions, which removes the names and makes it less generic worker-type-N -> worker-type. Which looks like a hack. :(

So this is my concern and I would like to hear your opinion. This is still a great library, thank.

Have you thought about the cardinality?
Was it intentional to have names in the metrics?
Have you dealt with these kinds of issues in your usage scenarios?

mdawar commented 1 year ago

Hey,

I haven't dealt with this kind of issue, but I understand the problem you're talking about. This exporter was for a hobby project so I didn't take all the use cases into consideration.

I think you need this name label to be able to identify the workers, at least so they are unique and can be counted, if you don't care about the count of the workers these labels can be dropped using the Prometheus configuration with metric_relabel_configs, for example I have tried this configuration and I was able to drop the name label:

scrape_configs:
  - job_name: 'rq_exporter'
    scrape_interval: 5s
    static_configs:
      - targets:
          - rq_exporter:9726
    metric_relabel_configs:
      - regex: 'name' # Label to drop
        action: labeldrop

The result:

rq_workers{instance="rq_exporter:9726",job="rq_exporter",queues="high,default,low",state="busy"}
rq_workers{instance="rq_exporter:9726",job="rq_exporter",queues="high,default,low",state="idle"}

But as you can see they're only identified by their state which happens to be different in this case.

You can read more: Docs: relabel_config and metric_relabel_configs See also: Dropping metrics at scrape time with Prometheus

I don't think we can add a flag to remove the name label, because as I said they uniquely identify the workers.

If you still need to drop these labels from the source, it's pretty easy to fork the project and make any modifications you want and build the Docker image, there's also a docker-compose.yml file with a fully working dev environment.

I hope that have answered your questions.

biozz commented 1 year ago

Dropping labels does look like a solution, but it also drops labels :)

I think I am going to settle with a more readable name, something that contains version of the app and a sequential number. At least that way I am not stressed out by having UUIDs in label values.

I don't have any further questions, thank you!

mdawar / rq-exporter

Concerns about metric labels cardinality #28