prometheus-community / stackdriver_exporter

Google Stackdriver Prometheus exporter
Apache License 2.0
256 stars 97 forks source link

Inconsistent metrics from storage.googleapis.com #41

Open arcenik opened 5 years ago

arcenik commented 5 years ago

Hello,

I encounter some issue to grab the storage metrics. Some time I've got all the metrics, some time just some of them and some times none.

I've started it with just storage and verbose output

STACKDRIVER_EXPORTER_MONITORING_METRICS_TYPE_PREFIXES=storage.googleapis.com/

And observed the following logs:

testsd_1  | time="2018-11-15T08:27:11Z" level=debug msg="Listing Google Stackdriver Monitoring metric descriptors starting with `storage.googleapis.com/`..." source="monitoring_collector.go:213"
testsd_1  | time="2018-11-15T08:27:11Z" level=debug msg="Retrieving Google Stackdriver Monitoring metrics for descriptor `storage.googleapis.com/network/received_bytes_count`..." source="monitoring_collector.go:169"
testsd_1  | time="2018-11-15T08:27:11Z" level=debug msg="Retrieving Google Stackdriver Monitoring metrics for descriptor `storage.googleapis.com/network/sent_bytes_count`..." source="monitoring_collector.go:169"
testsd_1  | time="2018-11-15T08:27:11Z" level=debug msg="Retrieving Google Stackdriver Monitoring metrics for descriptor `storage.googleapis.com/storage/object_count`..." source="monitoring_collector.go:169"
testsd_1  | time="2018-11-15T08:27:11Z" level=debug msg="Retrieving Google Stackdriver Monitoring metrics for descriptor `storage.googleapis.com/api/request_count`..." source="monitoring_collector.go:169"
testsd_1  | time="2018-11-15T08:27:11Z" level=debug msg="Retrieving Google Stackdriver Monitoring metrics for descriptor `storage.googleapis.com/storage/total_byte_seconds`..." source="monitoring_collector.go:169"
testsd_1  | time="2018-11-15T08:27:11Z" level=debug msg="Retrieving Google Stackdriver Monitoring metrics for descriptor `storage.googleapis.com/storage/total_bytes`..." source="monitoring_collector.go:169"

But only one metric is returned

# curl -vs localhost:9256/metrics 2>&1 | grep gcs_bucket | sed 's/{.*/{.../'
# HELP stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count Total number of objects per bucket, grouped by storage class. Values are measured once per day.
# TYPE stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count gauge
stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count{...
stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count{...
stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count{...
stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count{...

Or two

# curl -vs localhost:9256/metrics 2>&1 | grep gcs_bucket | sed 's/{.*/{.../'
# HELP stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count Total number of objects per bucket, grouped by storage class. Values are measured once per day.
# TYPE stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count gauge
stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count{...
stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count{...
stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count{...
stackdriver_gcs_bucket_storage_googleapis_com_storage_object_count{...
# HELP stackdriver_gcs_bucket_storage_googleapis_com_storage_total_byte_seconds Total daily storage in byte*seconds used by the bucket, grouped by storage class. * Connection #0 to host localhost left intact
# TYPE stackdriver_gcs_bucket_storage_googleapis_com_storage_total_byte_seconds gauge
stackdriver_gcs_bucket_storage_googleapis_com_storage_total_byte_seconds{...
stackdriver_gcs_bucket_storage_googleapis_com_storage_total_byte_seconds{...
stackdriver_gcs_bucket_storage_googleapis_com_storage_total_byte_seconds{...
stackdriver_gcs_bucket_storage_googleapis_com_storage_total_byte_seconds{...

Any idea why all metrics (like stackdriver_gcs_bucket_storage_googleapis_com_storage_total_bytes) are not returned each time ?

Thanks.

ipstatic commented 5 years ago

We are seeing this too with stackdriver_gcs_bucket_storage_googleapis_com_storage_total_bytes being returned inconsistently. It appears every 5 minutes then disappears again.

screen shot 2018-12-14 at 9 33 16 am

Could this be some sort of API limit and the exporter is just not caching the results?

ipstatic commented 5 years ago

Ahh straight from https://cloud.google.com/monitoring/api/metrics_gcp#gcp-storage

Total size of all objects in the bucket, grouped by storage class. Values are measured once per day. Sampled every 300 seconds. After sampling, data is not visible for up to 600 seconds.

So we need to cache this result as it will not be available for up to 10 minutes.

rogierlommers commented 5 years ago

Hey, I'm facing exactly the same problem. It also seems that the sampling doesn't work. I see changes in storage_total_bytes only once a day. I also don't understand the documentation of stackdriver. For example, the docs about storage say:

storage/total_bytes: Total size of all objects in the bucket, grouped by storage class.
Values are measured once per day. Sampled every 300 seconds. After sampling, data
is not visible for up to 600 seconds.

What exactly does this mean? I interpret this as:

Now please see my stackdriver console output below. You see the past 7 days and the graph is only updated once a day. I don't see the sampling data changing the graphs at all. This basically makes the metrics useless (since they are at max 24h old).

storage-values

jrluis commented 5 years ago

Hi @frodenas ,

I'm also having the same issue on another metric.

Found this documentation regarding the "not visible for up" https://cloud.google.com/monitoring/api/metrics#metadata

My interpretation of the "Sampled every 300 seconds." is that every 300 seconds google goes into the bucket and counts the number of objects. It then sends the count to some storage. The "After sampling, data is not visible for up to 600 seconds." is the time the count takes to be consistent on the storage. Only after is being consistent can we read it using the api.

This has a big impact on metric collection, as to get a non empty value, we should be able to configure some delay on the metric date time.

i.g. for object count, when it stackdriver exporter collection runs, it should fetch the metric value for at least 600 seconds ago, if not it will get empty values as is not yet visible because the 600 seconds haven't passed yet.

Regards

jrluis commented 5 years ago

Stackdriver exporter already support the option to set an offset on the metrics date.

See: https://github.com/frodenas/stackdriver_exporter/blob/master/collectors/monitoring_collector.go#L179

Tracing back the code, it can be configured at the cli with the switch: --monitoring.metrics-offset=610s