Closed ykyuen closed 5 years ago
Hi,
Give me a few days to analyze this. I am currently on travel and need to find some time. Thanks for the fix, too :)
No rush. enjoy ur trip~~ =D
Hi Yuen,
It appears to me that this is due to the deadLabel definition. The dead labels mechanism works in a way that if there is no new value within the retention period then all lines get "stripped-down" to the dead labels. In your example:
http_requests_total{instance="httpd-exporter.obs-prod-www-3:80",job="atg_www_3",method="GET",status="5xx"}
will be changed to:
http_requests_total{method="GET",status="5xx",deadCounter="true"}
I am a bit confused why these labels do not appear. However, I've learned recently that best practice for Prometheus counters is to not reset them. Prometheus can figure out this stuff itself. So you can simply disable dead labels (config enableDeadLabels=0). Then these labels will appear forever. Or you increase the retentionPeriod to a much larger value.
Ok, furthermore: without dead labels the labels will be removed after about 40days inactivity.
thanks for your update. i will try it when i return to office next week~
Hi @technicalguru,
A few findings for your reference.
I check the metrics collected in prometheus but i couldn't find any metrics label named as "deadCounter". It sounds like the dead counters doesn't work on my setup.
Last friday, i set the retention period to retentionSeconds=3600000
. So the metrics should be kept for 1000 hours before being removed. but when i checked the graph this morning. seems somehow the metrics was reset after a period of time.
I have disabled deadLabels for my setup. Let's see how it goes.
I suspect that the broken line is because the server seldom response with 5xx and the timestamp in the metrics
file is not updated. For example, the 5xx
count below always has the timestamp 1549253617000
until new 5xx
response is captured.
# TYPE http_requests_total counter
# HELP http_requests_total Counts the requests that were logged by HTTP daemon
http_requests_total{status="5xx"} 1 1549253617000
http_requests_total{status="3xx"} 816447 1549255169000
http_requests_total{status="2xx"} 907983 1549255169000
http_requests_total{status="4xx"} 24437 1549255154000
For pt. 4, is it possible for align all the timestamps on each update? i.e
# TYPE http_requests_total counter
# HELP http_requests_total Counts the requests that were logged by HTTP daemon
http_requests_total{status="5xx"} 1 1549255169000
http_requests_total{status="3xx"} 816447 1549255169000
http_requests_total{status="2xx"} 907983 1549255169000
http_requests_total{status="4xx"} 24437 1549255169000
Would there be any impact with this change?
Also confirmed that even i disabled the dead label, the metrics will still reset itself.
[General]
metricsFile=/var/www/html/httpd-exporter/metrics
addLabels=
addStatusGroupLabel=status
collectBytesTransferred=bytes_sent
retentionSeconds=3600000
enableDeadLabels=0
deadLabels=
[LogFormats]
# LC ATG Apache log format
%{NOTSPACE:forward} %{IP:clientip} %{NOTSPACE:rlogname} %{NOTSPACE:user} \[%{HTTPDATE:timestamp}\] %{HOSTNAME:virtualHost} %{NOTSPACE:sslprotocol} %{NOTSPACE:sslcipher} "%{REQUEST_LINE}" %{INT:status} %{INT:bytes_sent} (%{QS:referrer}|-) (%{QS:agent}|-) (%{QS:jsessonid}|-) %{INT:timespent}
[/var/log/httpd/row_prod_estore_access_log]
type=httpd
labels={instance_ip="${HOSTIP}",instance_hostname="${HOSTNAME}"}
[/var/log/httpd/hk_prod_estore_access_log]
type=httpd
labels={instance_ip="${HOSTIP}",instance_hostname="${HOSTNAME}"}
[/var/log/httpd/cn_prod_estore_access_log]
type=httpd
labels={instance_ip="${HOSTIP}",instance_hostname="${HOSTNAME}"}
Graph shown on prometheus
Hmmm, perhaps we need to update the timestamp or skip it completely. However, this would make the deadLabels impossible (which were a test feature anyway). I will open a branch and change the code accordingly. Then we can see if this will work better.
Maybe we could consider this
Case (1): if deadLabels
is enabled, each metric has its own timestamp. nothing changes.
Case (2): if deadLabels
is disabled. all metrics follows the same latest timestamp.
The downside is the behaviors of the above two cases are quite inconsistent so the user might be confused.
I made the following changes so all metrics have the same updated timestamp. But I am not familiar with perl so not sure if it is a proper way.
diff --git a/modules/ApExportMetrics.pm b/modules/ApExportMetrics.pm
index b7ad30b..b4310bd 100644
--- a/modules/ApExportMetrics.pm
+++ b/modules/ApExportMetrics.pm
@@ -128,6 +128,9 @@ sub set {
if ($labels) {
$self->{values}->{$labels} = $value;
$self->{timestamps}->{$labels} = $timestamp;
+ foreach my $key (keys %{$self->{timestamps}}) {
+ $self->{timestamps}->{$key} = $timestamp;
+ }
} else {
$self->{value} = $value;
$self->{timestamp} = $timestamp;
@@ -151,6 +154,9 @@ sub inc {
$self->{values}->{$labels} = 1;
}
$self->{timestamps}->{$labels} = $timestamp;
+ foreach my $key (keys %{$self->{timestamps}}) {
+ $self->{timestamps}->{$key} = $timestamp;
+ }
} else {
if (exists($self->{value})) {
$self->{value}++;
@@ -178,6 +184,9 @@ sub dec {
$self->{values}->{$labels} = -1;
}
$self->{timestamps}->{$labels} = $timestamp;
+ foreach my $key (keys %{$self->{timestamps}}) {
+ $self->{timestamps}->{$key} = $timestamp;
+ }
} else {
if (exists($self->{value})) {
$self->{value}--;
@@ -207,6 +216,9 @@ sub add {
$self->{values}->{$labels} = $value;
}
$self->{timestamps}->{$labels} = $timestamp;
+ foreach my $key (keys %{$self->{timestamps}}) {
+ $self->{timestamps}->{$key} = $timestamp;
+ }
} else {
if (exists($self->{value})) {
$self->{value} += $value;
@@ -236,6 +248,9 @@ sub sub {
$self->{values}->{$labels} = 0-$value;
}
$self->{timestamps}->{$labels} = $timestamp;
+ foreach my $key (keys %{$self->{timestamps}}) {
+ $self->{timestamps}->{$key} = $timestamp;
+ }
} else {
if (exists($self->{value})) {
$self->{value} -= $value;
If this sounds ok to u, i will try adding deadLabels
checking such that it will follow the cases i mentioned above and later make a pull request.
Please feel free to let me know if you think there is a better way to do that, i would love to take this chance to play with perl.
Thanks. =D
For your reference, after making all metrics following the latest updated timestamp, the lines for infrequent responses are no longer broken.
Hi,
sorry for late response. I was thinking of removing the timestamps in the metrics (Prometheus adds them automatically) and wipe out the deadLabels feature completely. As Prometheus suggests, a reset can be detected by it. So regulary deleting the metrics file completely makes more sense to it - or a reset can be implemented.
For the moment, I would simply disable the retention check and do not export the timestamp value in the metrics file.
That sounds even a simpler and better approach. Thanks for the changes.
i will try the master branch with ur latest commit and let u know the result later~ thanks :pray:
Hi @technicalguru, everything works well. thanks~ =D
I am using the httpd-exporter on a Linux apache server, here is my config file
It works but after i ran the exporter for a night i found some data points are missing in prometheus.
I thought the all the lines would be a straight line but somehow the metrics in the generated
metrics
file is gone. Say, thehttp_requests_total{instance="httpd-exporter.obs-prod-www-3:80",job="atg_www_3",method="GET",status="5xx"}
exists in the above graph but i couldn't find it in themetrics
file.Is there anyway to keep the metrics once it is created? Or did i miss anything?
Thanks. =)