Missing data point - Githubissues

ykyuen commented 5 years ago

I am using the httpd-exporter on a Linux apache server, here is my config file

[General]
metricsFile=/var/www/html/httpd-exporter/metrics
addLabels=method
addStatusGroupLabel=status
collectBytesTransferred=bytes_sent
retentionSeconds=3600
enableDeadLabels=1
deadLabels=method,status,kubernetes_namespace_name

[LogFormats]
%{NOTSPACE:forward} %{IP:clientip} %{NOTSPACE:rlogname} %{NOTSPACE:user} \[%{HTTPDATE:timestamp}\] %{HOSTNAME:virtualHost} %{NOTSPACE:sslprotocol} %{NOTSPACE:sslcipher} "%{REQUEST_LINE}" %{INT:status} %{INT:bytes_sent} (%{QS:referrer}|-) (%{QS:agent}|-) (%{QS:jsessonid}|-) %{INT:timespent}

[/var/log/httpd/www1.log]
type=httpd
labels={instance_ip="${HOSTIP}",instance_hostname="${HOSTNAME}"}

[/var/log/httpd/www2.log]
type=httpd
labels={instance_ip="${HOSTIP}",instance_hostname="${HOSTNAME}"}

[/var/log/httpd/www3.log]
type=httpd
labels={instance_ip="${HOSTIP}",instance_hostname="${HOSTNAME}"}

It works but after i ran the exporter for a night i found some data points are missing in prometheus.

sc-2019-01-29-10-59-33

I thought the all the lines would be a straight line but somehow the metrics in the generated metrics file is gone. Say, the http_requests_total{instance="httpd-exporter.obs-prod-www-3:80",job="atg_www_3",method="GET",status="5xx"} exists in the above graph but i couldn't find it in the metrics file.

# TYPE http_requests_total counter
# HELP http_requests_total Counts the requests that were logged by HTTP daemon
http_requests_total{method="GET",status="2xx"} 19393 1548728641000
http_requests_total{method="OPTIONS",status="3xx"} 72987 1548728641000
http_requests_total{method="GET",status="3xx"} 480 1548728611000
http_requests_total{method="GET",status="4xx"} 176 1548727646000
http_requests_total{method="POST",status="2xx"} 2773 1548728611000

# TYPE http_sent_bytes counter
# HELP http_sent_bytes Number of bytes transferred as logged by HTTP daemon
http_sent_bytes{method="POST",status="2xx"} 2957019 1548728611000
http_sent_bytes{method="OPTIONS",status="3xx"} 18003440 1548728641000
http_sent_bytes{method="GET",status="4xx"} 8734949 1548727646000
http_sent_bytes{method="GET",status="3xx"} 122756 1548728611000
http_sent_bytes{method="GET",status="2xx"} 723296628 1548728641000

Is there anyway to keep the metrics once it is created? Or did i miss anything?

Thanks. =)

technicalguru commented 5 years ago

Hi,

Give me a few days to analyze this. I am currently on travel and need to find some time. Thanks for the fix, too :)

ykyuen commented 5 years ago

No rush. enjoy ur trip~~ =D

technicalguru commented 5 years ago

Hi Yuen,

It appears to me that this is due to the deadLabel definition. The dead labels mechanism works in a way that if there is no new value within the retention period then all lines get "stripped-down" to the dead labels. In your example:

http_requests_total{instance="httpd-exporter.obs-prod-www-3:80",job="atg_www_3",method="GET",status="5xx"}

will be changed to:

http_requests_total{method="GET",status="5xx",deadCounter="true"}

I am a bit confused why these labels do not appear. However, I've learned recently that best practice for Prometheus counters is to not reset them. Prometheus can figure out this stuff itself. So you can simply disable dead labels (config enableDeadLabels=0). Then these labels will appear forever. Or you increase the retentionPeriod to a much larger value.

technicalguru commented 5 years ago

Ok, furthermore: without dead labels the labels will be removed after about 40days inactivity.

ykyuen commented 5 years ago

thanks for your update. i will try it when i return to office next week~

ykyuen commented 5 years ago

Hi @technicalguru,

A few findings for your reference.

I check the metrics collected in prometheus but i couldn't find any metrics label named as "deadCounter". It sounds like the dead counters doesn't work on my setup.
Last friday, i set the retention period to retentionSeconds=3600000. So the metrics should be kept for 1000 hours before being removed. but when i checked the graph this morning. seems somehow the metrics was reset after a period of time.

sc-2019-02-04-13-05-36

I have disabled deadLabels for my setup. Let's see how it goes.
I suspect that the broken line is because the server seldom response with 5xx and the timestamp in the metrics file is not updated. For example, the 5xx count below always has the timestamp 1549253617000 until new 5xx response is captured.

# TYPE http_requests_total counter
# HELP http_requests_total Counts the requests that were logged by HTTP daemon
http_requests_total{status="5xx"} 1 1549253617000
http_requests_total{status="3xx"} 816447 1549255169000
http_requests_total{status="2xx"} 907983 1549255169000
http_requests_total{status="4xx"} 24437 1549255154000

For pt. 4, is it possible for align all the timestamps on each update? i.e

# TYPE http_requests_total counter
# HELP http_requests_total Counts the requests that were logged by HTTP daemon
http_requests_total{status="5xx"} 1 1549255169000
http_requests_total{status="3xx"} 816447 1549255169000
http_requests_total{status="2xx"} 907983 1549255169000
http_requests_total{status="4xx"} 24437 1549255169000

Would there be any impact with this change?

ykyuen commented 5 years ago

Also confirmed that even i disabled the dead label, the metrics will still reset itself.

[General]
metricsFile=/var/www/html/httpd-exporter/metrics
addLabels=
addStatusGroupLabel=status
collectBytesTransferred=bytes_sent
retentionSeconds=3600000
enableDeadLabels=0
deadLabels=

[LogFormats]
# LC ATG Apache log format
%{NOTSPACE:forward} %{IP:clientip} %{NOTSPACE:rlogname} %{NOTSPACE:user} \[%{HTTPDATE:timestamp}\] %{HOSTNAME:virtualHost} %{NOTSPACE:sslprotocol} %{NOTSPACE:sslcipher} "%{REQUEST_LINE}" %{INT:status} %{INT:bytes_sent} (%{QS:referrer}|-) (%{QS:agent}|-) (%{QS:jsessonid}|-) %{INT:timespent}

[/var/log/httpd/row_prod_estore_access_log]
type=httpd
labels={instance_ip="${HOSTIP}",instance_hostname="${HOSTNAME}"}

[/var/log/httpd/hk_prod_estore_access_log]
type=httpd
labels={instance_ip="${HOSTIP}",instance_hostname="${HOSTNAME}"}

[/var/log/httpd/cn_prod_estore_access_log]
type=httpd
labels={instance_ip="${HOSTIP}",instance_hostname="${HOSTNAME}"}

Graph shown on prometheus sc-2019-02-04-14-13-06

technicalguru commented 5 years ago

Hmmm, perhaps we need to update the timestamp or skip it completely. However, this would make the deadLabels impossible (which were a test feature anyway). I will open a branch and change the code accordingly. Then we can see if this will work better.

ykyuen commented 5 years ago

Maybe we could consider this

Case (1): if deadLabels is enabled, each metric has its own timestamp. nothing changes.

Case (2): if deadLabels is disabled. all metrics follows the same latest timestamp.

The downside is the behaviors of the above two cases are quite inconsistent so the user might be confused.

I made the following changes so all metrics have the same updated timestamp. But I am not familiar with perl so not sure if it is a proper way.

diff --git a/modules/ApExportMetrics.pm b/modules/ApExportMetrics.pm
index b7ad30b..b4310bd 100644
--- a/modules/ApExportMetrics.pm
+++ b/modules/ApExportMetrics.pm
@@ -128,6 +128,9 @@ sub set {
        if ($labels) {
                $self->{values}->{$labels}     = $value;
                $self->{timestamps}->{$labels} = $timestamp;
+               foreach my $key (keys %{$self->{timestamps}}) {
+                       $self->{timestamps}->{$key} = $timestamp;
+               }
        } else {
                $self->{value}     = $value;
                $self->{timestamp} = $timestamp;
@@ -151,6 +154,9 @@ sub inc {
                        $self->{values}->{$labels} = 1;
                }
                $self->{timestamps}->{$labels} = $timestamp;
+               foreach my $key (keys %{$self->{timestamps}}) {
+                        $self->{timestamps}->{$key} = $timestamp;
+                }
        } else {
                if (exists($self->{value})) {
                        $self->{value}++;
@@ -178,6 +184,9 @@ sub dec {
                        $self->{values}->{$labels} = -1;
                }
                $self->{timestamps}->{$labels} = $timestamp;
+               foreach my $key (keys %{$self->{timestamps}}) {
+                        $self->{timestamps}->{$key} = $timestamp;
+                }
        } else {
                if (exists($self->{value})) {
                        $self->{value}--;
@@ -207,6 +216,9 @@ sub add {
                        $self->{values}->{$labels} = $value;
                }
                $self->{timestamps}->{$labels} = $timestamp;
+               foreach my $key (keys %{$self->{timestamps}}) {
+                        $self->{timestamps}->{$key} = $timestamp;
+                }
        } else {
                if (exists($self->{value})) {
                        $self->{value} += $value;
@@ -236,6 +248,9 @@ sub sub {
                        $self->{values}->{$labels} = 0-$value;
                }
                $self->{timestamps}->{$labels} = $timestamp;
+               foreach my $key (keys %{$self->{timestamps}}) {
+                        $self->{timestamps}->{$key} = $timestamp;
+                }
        } else {
                if (exists($self->{value})) {
                        $self->{value} -= $value;

If this sounds ok to u, i will try adding deadLabels checking such that it will follow the cases i mentioned above and later make a pull request.

Please feel free to let me know if you think there is a better way to do that, i would love to take this chance to play with perl.

Thanks. =D

ykyuen commented 5 years ago

For your reference, after making all metrics following the latest updated timestamp, the lines for infrequent responses are no longer broken.

sc-2019-02-11-10-40-39

technicalguru commented 5 years ago

Hi,

sorry for late response. I was thinking of removing the timestamps in the metrics (Prometheus adds them automatically) and wipe out the deadLabels feature completely. As Prometheus suggests, a reset can be detected by it. So regulary deleting the metrics file completely makes more sense to it - or a reset can be implemented.

For the moment, I would simply disable the retention check and do not export the timestamp value in the metrics file.

ykyuen commented 5 years ago

That sounds even a simpler and better approach. Thanks for the changes.

i will try the master branch with ur latest commit and let u know the result later~ thanks :pray:

ykyuen commented 5 years ago

Hi @technicalguru, everything works well. thanks~ =D

technicalguru / httpd-exporter

Missing data point #2