ncabatoff / process-exporter

Prometheus exporter that mines /proc to report on selected processes
MIT License
1.72k stars 270 forks source link

High cardinality of Process Exporter metrics #289

Open lamjob1993 opened 9 months ago

lamjob1993 commented 9 months ago

Hi!

Our stack: Grafana + Mimir + Prometheus + Process Exporter in K8S.

Help me answer the question about the high cardinality of metrics:

_namedprocess_namegroup_memory_bytes namedprocess_namegroup_num_procs namedprocess_namegroup_memorybytes

Our Prometheus cant handle the load because of these metrics. What advice can you give on optimizing the high cardinality process-exporter? I also noticed that these metrics keep their values in process-exporter/metrics and until you restart the process-exporter process-exporter/metrics page will be updated with outdated metric vlues.

I am interested in solving the issue at the process-exporter level!

It is clear that the problem of high cardinality can be solved at the Prometheus level, but this will be the next stage of optimization.

I can't send metric logs because I work in a bank.

Addition Process exporter also collects the IDs of Greenplum DB users. About 20 metrics appear for each user account. After the user has finished working, 20 metrics for him are not deleted from the Process explorer page, as a result, Process exporter remembers all users. At the moment, it is treated only by restarting the Process exporter.

Thanks!

ncabatoff commented 9 months ago

Hi,

I don't have much to suggest I'm afraid. process-exporter fills a void in the prometheus exporter ecosystem, but it's not going to be suitable to all use cases, due to the high cardinality requirements.

You could define fewer named groups.

You could filter out some metrics you can live without.

You could split the load between multiple Prometheus instances.

I can't think of any obvious other solutions.

On Wed, Feb 21, 2024 at 1:24 PM lamjob1993 @.***> wrote:

Hi!

Our stack: Grafana + Mimir + Prometheus + Process Exporter in K8S.

Help me answer the question about the high cardinality of metrics:

namedprocess_namegroup_memory_bytes namedprocess_namegroup_num_procs namedprocess_namegroup_memory_bytes

Our Prometheus cant handle the load because of these metrics. What advice can you give on optimizing the high cardinality process-exporter? I also noticed that these metrics keep their values in process-exporter/metrics and until you restart the process-exporter process-exporter/metrics page will be updated with outdated metric vlues.

I am interested in solving the issue at the process-exporter level!

It is clear that the problem of high cardinality can be solved at the Prometheus level, but this will be the next stage of optimization.

I can't send metric logs because I work in a bank.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/ncabatoff/process-exporter/issues/289, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKUCJH6HE2D4Q5RVZPEFILYUY3WZAVCNFSM6AAAAABDTRLBNCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DONBSGMYDENI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

lamjob1993 commented 9 months ago

We have already deployed 1000 instances of Process exporter to monitor 1000 instances of Greenplum DB. So we will first try to solve the problem at the Process exporter level. Prometheus - this will be the next stage. Thanks for the answer!

StefanSander3 commented 2 months ago

@lamjob1993 have you found a solution to this issue?

lamjob1993 commented 2 months ago

@lamjob1993 have you found a solution to this issue?

We have found a solution within our process. We told our customer, who uses Process Exporter, that he is using the exporter logic incorrectly.

He used the exporter to monitor user activity by processor time (CPU) on nodes, instead of using the Process Exporter for its intended purpose. This led to high cardinality and our Prometheus was drowning in data. Therefore, the solution to the problem turned out to be on the surface and closer than we thought.

Customers do not always make adequate decisions.