High cardinality from `pid_hash` label

absolutejam commented 3 years ago

Hi 👋

Thanks for sharing this exporter, I really appreciate the insight it's given us.

I just wanted to create an issue because I've noticed that the default setup will add the pid_hash label. In our setup where we have 5 replicas of an application with a fixed 128 process count, that leads to 640 series. And when we redeploy every few hours... that's a cardinality explosion waiting to happen!

I just wondered if this is something that should possibly be documented or turned off by default, and added with a flag. I could look at adding this if you think there's merit?

estahn commented 3 years ago

@absolutejam Glad the exporter is working for you. In regards to pid_hash you can find a discussion here #76.

If you have a solution in mind I'm open to contributions.

absolutejam commented 3 years ago

Personally, I'm not sure how much we value the visibility into seeing each process individually (at least, via. some kind of name). Maybe process lifetime (probably too hard at an aggregated level) or process restart rate would be a better indicator, but I can't attest to what others would use this for.

Really, I'd just advocate for either having this enabled via. a flag or make it abundantly clear in the docs about the potential risks in certain setups (Or better yet, a flag and some docs to cover it).

estahn commented 3 years ago

@absolutejam I believe I have used that to identify issues due to process churn, but I don't 100% recall anymore.

re docs; I think it makes sense to have this documented and warn for cardinality explosion in environments with processes that have high churn.
Are you advocating to remove the pid_hash entirely or to replace it with a different indicator (e.g. index)? If removed entirely I assume something like "# of processes by state" is not possible anymore:

matejzero commented 3 years ago

Another similar issue was opened here: https://github.com/hipages/php-fpm_exporter/issues/85.

In my case I have 100k unique label values for pid_hash label. Multiply that with 5 metrics that have pid_hash label and a few state values for phpfpm_process_state adds up to a very high value.

matejzero commented 3 years ago

I'm trying to come out with a solution for current pid_hash, but I don't think there is a good solution that would not explode in labels names.

One solution is to use array id. I know that has it's down sides as all threads change IDs if ie. first thread gets killed / restarted.

Another option could be to build PID:ID mapping, but that is more work as we need to figure out which PIDs still exists and which don't and map new PIDs to IDs. But if you need to map php-fpm status page output (if you want to manually debug) to an ID, that is hard to do.

itcsoft54 commented 3 years ago

* Are you advocating to remove the `pid_hash` entirely or to replace it with a different indicator (e.g. index)? If removed entirely I assume something like "_# of processes by state_" is not possible anymore:

Replace pid_hash by slot. slot means process number of total process :

Exemple slot 5 process, is the 5th process of a total of 10.

I look to php_fpm status, it seem to display process in pool always in same order, replacing old process by new, filling the holes :

pool:                 www
process manager:      dynamic
start time:           03/Nov/2020:03:52:34 +0100
start since:          1407704
accepted conn:        734089
listen queue:         0
max listen queue:     0
listen queue len:     0
idle processes:       5
active processes:     1
total processes:      78
max active processes: 84
max children reached: 0
slow requests:        0

************************
pid:                  19138
state:                Idle
[..]
************************
pid:                  20017
state:                Idle
[..]
************************
pid:                  24874
state:                Idle
[..]
************************
pid:                  19550
state:                Running
[..]
************************
pid:                  3622
state:                Idle
[..]
************************
pid:                  29377
state:                Idle
[..]
************************

process 19550 was termined and replace by new one :

pool:                 www
process manager:      dynamic
start time:           03/Nov/2020:03:52:34 +0100
start since:          1407704
accepted conn:        734089
listen queue:         0
max listen queue:     0
listen queue len:     0
idle processes:       5
active processes:     1
total processes:      78
max active processes: 84
max children reached: 0
slow requests:        0

************************
pid:                  19138
state:                Idle
[..]
************************
pid:                  20017
state:                Idle
[..]
************************
pid:                  24874
state:                Idle
[..]
************************
pid:                  17583
state:                Idle
[..]
************************
pid:                  3622
state:                Running
[..]
************************
pid:                  29377
state:                Idle
[..]
************************

Order was preserved.

So if php_fpm page was scraped "as is", you don't need to do anything but prefix pool index by "slot" keyword.

my php_fpm version (from debian php sury ) :
# php-fpm7.2 --version
PHP 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b (fpm-fcgi) (built: May 14 2020 08:33:26)
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b, Copyright (c) 1999-2018, by Zend Technologies
    with Xdebug v2.7.2, Copyright (c) 2002-2019, by Derick Rethans

edit : formating

matejzero commented 3 years ago

Slot is probably better label than id.

For slot number, we could use array id from https://github.com/hipages/php-fpm_exporter/blob/master/phpfpm/exporter.go#L214, right? Order should be the same as on status pade.

Maybe add a +1, so it starts with 1 and not 0 and better match with pm.max_processes value.

olivierHa commented 3 years ago

Nice idea @itcsoft54 Slot number should solve the cardinality explosion with "pid_hash"

estahn commented 3 years ago

@itcsoft54 Thanks for your contribution.

I just revisited the various discussions and arguments. I also reviewed fpm_children.c to verify your findings.

I suggest renaming slot to child as that is the wording used by php-fpm (pool -> child -> process).
How would you determine the number of killed/restarted processes over time? Is there a way to count the counter reset? https://github.com/hipages/php-fpm_exporter/issues/76#issuecomment-597447476

Another option could be to provide process aggregates as suggested here or potentially both with an option to disable per-process metrics.

Thoughts?

itcsoft54 commented 3 years ago

@estahn

I add a new commit to rename slot to child
I don't know, but using "label" to this is a bad idea too. Label was associed to a metric, so if processes was spawn between to "scrap" we miss it. Exemple : if process was respawn 5 time between two scraps, you registred 1 label value, and miss 4 "restart/kill" process.

So metrics based on label has low reliability. If php-fpm doesn't expose such metric (or metrics to compute it), we can't have it reliability.

In add, using "sum of phpfpm_process_requests" to compute "total requests" have same issue. It's better to use the php-fpm exposed "accepted conn" metric even if it not exactly what we want.

Using avg of phpfpm_process_requests can still use because "avg" of lot of value are less impacted if missing some values.

matejzero commented 3 years ago

Do we need child prefix in child label? Wouldn't it be enough to only have a number ie. child="0" or maybe child_num?

Valid point about sum of phpfpm_process_requests to calculate number of requests. It works in a lot of cases, but it might not be 100% correct in cases you pointed out.

Regarding process aggregates, that would be useful. Some users might not need extra details such as last process memory usage, cpu usage,... But for process states, it would be good to export all label values, like I pointed out in https://github.com/hipages/php-fpm_exporter/issues/123.

estahn commented 3 years ago

@itcsoft54

I add a new commit to rename slot to child

Thanks, I agree with @matejzero, use the index, or is there a need for a prefix?

I don't know, but using "label" to this is a bad idea too. Label was associed to a metric, so if processes was spawn between to "scrap" we miss it. Exemple : if process was respawn 5 time between two scraps, you registred 1 label value, and miss 4 "restart/kill" process.

So metrics based on label has low reliability. If php-fpm doesn't expose such metric (or metrics to compute it), we can't have it reliability.

I understand where you are coming from but I don't fully agree with your argument. I assume if you say "reliability" you mean "accuracy". We don't require 100% accuracy as long as there is statistical relevance.

https://prometheus.io/docs/introduction/overview/#when-does-it-not-fit

In add, using "sum of phpfpm_process_requests" to compute "total requests" have same issue. It's better to use the php-fpm exposed "accepted conn" metric even if it not exactly what we want.

Using avg of phpfpm_process_requests can still use because "avg" of lot of value are less impacted if missing some values.

phpfpm_process_requests is fairly accurate for us, we don't use "dynamic mode" and therefore have relatively low churn. In that sense, it will also depend on your configuration how accurate your metrics are.

We can implement process aggregates in a separate PR and I'm happy to merge your PR once 1. has been worked out.

Thanks again for tackling this.

itcsoft54 commented 3 years ago

I understand where you are coming from but I don't fully agree with your argument. I assume if you say "reliability" you mean "accuracy". We don't require 100% accuracy as long as there is statistical relevance.

Ok, "reliability" may be a pretty loaded word for this case.

I re-read, some exemples of label in prometheus documentation's :

For example, rather than http_responses_500_total and http_responses_403_total, create a single metric called http_responses_total with a code label for the HTTP response code. You can then process the entire metric as one in rules and graphs.

So we don't need to prefix label.

I add commit to use index as label "child" value.

github-actions[bot] commented 3 years ago

:tada: This issue has been resolved in version 2.0.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

hipages / php-fpm_exporter

High cardinality from `pid_hash` label #110