prometheus-community / stackdriver_exporter

Google Stackdriver Prometheus exporter
Apache License 2.0
257 stars 98 forks source link

panic: duplicate label names #85

Open acondrat opened 4 years ago

acondrat commented 4 years ago

Looks like the exporter crashes when a metric has duplicate label names.

time="2020-05-04T13:49:10Z" level=info msg="Starting stackdriver_exporter (version=0.7.0, branch=HEAD, revision=a339261e716271d77f6dc73d1998600d6d31089b)" source="stackdriver_exporter.go:136"
time="2020-05-04T13:49:10Z" level=info msg="Build context (go=go1.14.2, user=root@6bfda044714a, date=20200501-12:39:15)" source="stackdriver_exporter.go:137"
time="2020-05-04T13:49:10Z" level=info msg="Listening on :9255" source="stackdriver_exporter.go:163"
panic: duplicate label names

goroutine 162 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstHistogram(...)
    /app/vendor/github.com/prometheus/client_golang/prometheus/histogram.go:619
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstHistogram(0xc000377a18, 0xc0005fe280, 0x50, 0xc000a47600, 0xe, 0x10, 0xc0003e4870, 0xc000a51da0, 0xc000a47700, 0xe, ...)
    /app/collectors/monitoring_metrics.go:94 +0x19a
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeHistogramMetrics(0xc000377a18)
    /app/collectors/monitoring_metrics.go:186 +0x1c7
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc000377a18)
    /app/collectors/monitoring_metrics.go:149 +0x39
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc0000b88f0, 0xc000940000, 0xc000356b00, 0xc0001a4f00, 0xc000940000, 0x0)
    /app/collectors/monitoring_collector.go:370 +0x10a3
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc000412640, 0xc0000b88f0, 0xe1193bb, 0xed6421353, 0x0, 0xe1193bb, 0xed642147f, 0x0, 0xc0003645a0, 0xc000356b00, ...)
    /app/collectors/monitoring_collector.go:223 +0x5e7
created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1
    /app/collectors/monitoring_collector.go:197 +0x3f3

It seems like something introduced in v0.7.0 as I don't see the same issue in v0.6.0

SuperQ commented 4 years ago

I wonder if this is a side effect of https://github.com/prometheus-community/stackdriver_exporter/pull/50

omerlh commented 4 years ago

Having the exact same issue here, testing version 0.6.0 and it is not reproduce-able - so it's probably either 0.7.0 or 0.8.0 (0.7.0 introduce #50)

SuperQ commented 4 years ago

Can you try building and running with #50 reverted?

omerlh commented 4 years ago

Still crash with the same error:

panic: duplicate label names

goroutine 99 [running]: github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...) /Users/omerlh/go/pkg/mod/github.com/prometheus/client_golang@v1.6.0/prometheus/value.go:106 github.com/prometheus-community/stackdriver_exporter/collectors.(TimeSeriesMetrics).newConstMetric(0xc000183a10, 0xc0005c8500, 0x4d, 0x34f92458, 0xed65d7041, 0x0, 0xc000285480, 0x8, 0x8, 0x2, ...) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_metrics.go:138 +0x204 github.com/prometheus-community/stackdriver_exporter/collectors.(TimeSeriesMetrics).completeConstMetrics(0xc000183a10) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_metrics.go:179 +0x1dd github.com/prometheus-community/stackdriver_exporter/collectors.(TimeSeriesMetrics).Complete(0xc000183a10) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_metrics.go:160 +0x2b github.com/prometheus-community/stackdriver_exporter/collectors.(MonitoringCollector).reportTimeSeriesMetrics(0xc000156000, 0xc00011e200, 0xc000140500, 0xc000128480, 0xc00011e200, 0x0) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_collector.go:400 +0x13c4 github.com/prometheus-community/stackdriver_exporter/collectors.(MonitoringCollector).reportMonitoringMetrics.func1.1(0xc000606620, 0xc000156000, 0x34f92458, 0xed65d6f15, 0x0, 0x34f92458, 0xed65d7041, 0x0, 0xc000756540, 0xc000140500, ...) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_collector.go:253 +0x6d7 created by github.com/prometheus-community/stackdriver_exporter/collectors.(MonitoringCollector).reportMonitoringMetrics.func1 /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_collector.go:227 +0x2bb

SuperQ commented 4 years ago

So I guess it's not #50 then, something else with the client_golang upgrade.

SuperQ commented 4 years ago

Can you include more details? Like the flags you're using with the exporter?

acondrat commented 4 years ago

Please find my setup bellow. I was having the duplicates panic issues with logging.googleapis.com/user. All other prefixes seem fine.

spec:
  containers:
  - command:
    - stackdriver_exporter
    env:
    - name: STACKDRIVER_EXPORTER_MONITORING_METRICS_TYPE_PREFIXES
      value: bigtable.googleapis.com/cluster,loadbalancing.googleapis.com/https/request_count,custom.googleapis.com,logging.googleapis.com/user
    - name: STACKDRIVER_EXPORTER_MONITORING_METRICS_INTERVAL
      value: 5m
    - name: STACKDRIVER_EXPORTER_MONITORING_METRICS_OFFSET
      value: 0s
    - name: STACKDRIVER_EXPORTER_WEB_LISTEN_ADDRESS
      value: :9255
    - name: STACKDRIVER_EXPORTER_WEB_TELEMETRY_PATH
      value: /metrics
    - name: STACKDRIVER_EXPORTER_MAX_RETRIES
      value: "0"
    - name: STACKDRIVER_EXPORTER_HTTP_TIMEOUT
      value: 10s
    - name: STACKDRIVER_EXPORTER_MAX_BACKOFF_DURATION
      value: 5s
    - name: STACKDRIVER_EXPORTER_BACKODFF_JITTER_BASE
      value: 1s
    - name: STACKDRIVER_EXPORTER_RETRY_STATUSES
      value: "503"
    image: prometheuscommunity/stackdriver-exporter:v0.7.0
dgarcdu commented 4 years ago

Same issue here with v0.9.1:

level=info ts=2020-06-15T11:40:31.592Z caller=stackdriver_exporter.go:136 msg="Starting stackdriver_exporter" version="(version=0.9.1, branch=HEAD, revision=770b1be3d430ef9768f30a2a5d2e35557e464f3c)"
level=info ts=2020-06-15T11:40:31.592Z caller=stackdriver_exporter.go:137 msg="Build context" build_context="(go=go1.14.4, user=root@faf330a7765b, date=20200602-12:12:58)"
level=info ts=2020-06-15T11:40:31.592Z caller=stackdriver_exporter.go:158 msg="Listening on" address=:9255
panic: duplicate label names

goroutine 9602 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
    /app/vendor/github.com/prometheus/client_golang/prometheus/value.go:106
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstMetric(0xc000e37a10, 0xc0004c2460, 0x4d, 0x12059df0, 0xed67954c8, 0x0, 0xc000871b00, 0xe, 0x10, 0x2, ...)
    /app/collectors/monitoring_metrics.go:139 +0x204
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeConstMetrics(0xc000e37a10)
    /app/collectors/monitoring_metrics.go:180 +0x1dd
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc000e37a10)
    /app/collectors/monitoring_metrics.go:161 +0x2b
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc00087b080, 0xc000901a00, 0xc000d2fe00, 0xc00165ef00, 0xc000901a00, 0x0)
    /app/collectors/monitoring_collector.go:414 +0x13c4
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc002207920, 0xc00087b080, 0x12059e1f, 0xed679539c, 0x0, 0x12059e1f, 0xed67954c8, 0x0, 0xc000fac720, 0xc000d2fe00, ...)
    /app/collectors/monitoring_collector.go:267 +0x6d7
created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1
    /app/collectors/monitoring_collector.go:241 +0x3f3

Edit 2020/07/14 I can confirm that the issue is still present in v0.10.0:


goroutine 477 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
    /app/vendor/github.com/prometheus/client_golang/prometheus/value.go:107
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstMetric(0xc0005cda10, 0xc0010ad7c0, 0x43, 0x1e63a5d8, 0xed69f2568, 0x0, 0xc001fc1580, 0x8, 0x8, 0x2, ...)
    /app/collectors/monitoring_metrics.go:139 +0x204
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeConstMetrics(0xc0005cda10)
    /app/collectors/monitoring_metrics.go:180 +0x1dd
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc0005cda10)
    /app/collectors/monitoring_metrics.go:161 +0x2b
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc0001e2540, 0xc001c10630, 0xc0005f4500, 0xc0003940c0, 0xc001c10630, 0x0)
    /app/collectors/monitoring_collector.go:406 +0x13c4
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc0004f8f60, 0xc0001e2540, 0x1e63a902, 0xed69f24b4, 0x0, 0x1e63a902, 0xed69f25e0, 0x0, 0xc000720780, 0xc0005f4500, ...)
    /app/collectors/monitoring_collector.go:259 +0x6d7
created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1
    /app/collectors/monitoring_collector.go:233 +0x3f3
jakubbujny commented 4 years ago

So I've debugged it as I have the same case.

The root cause of the problem is when you have defined custom metrics based on logs with some extractors and these extractors load the same labels as are injected by default by GCP logging

Example: You are on GKE. You have custom metric based on logs with extractor from field resource.labels.cluster_name into label cluster_name. For custom metrics on GKE cluster_name is already reported by default by GCP so you will see the duplicated labels what cause the panic.

Workaround: delete your custom extractors which are technically not needed

Edit: As far as I can see project_id is also injected by default.

hanikesn commented 3 years ago

So we had the exact same issue as above having duplicated the project_id ourselves. But we discovered an other issue with duplicate labels after enabling audit logs for spanner:

* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_log_entry_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"instance-east-1" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"production" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"1" > gauge:<value:10527 > timestamp_ms:1612880903770 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"instance-east-1" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"production" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:2.2907337e+07 > timestamp_ms:1612880903770 } was collected before with the same name and label values

I think it'd make sense to make the exporter more robust and only report duplicate labels on the cli and export an error metric instead.

EDIT: Same issue as in: #103

dgarcdu commented 3 years ago

So I've debugged it as I have the same case.

The root cause of the problem is when you have defined custom metrics based on logs with some extractors and these extractors load the same labels as are injected by default by GCP logging

Example: You are on GKE. You have custom metric based on logs with extractor from field resource.labels.cluster_name into label cluster_name. For custom metrics on GKE cluster_name is already reported by default by GCP so you will see the duplicated labels what cause the panic.

Workaround: delete your custom extractors which are technically not needed

Edit: As far as I can see project_id is also injected by default.

We finally solved this by going over all our log-based metrics. Took a while, as we have quite a few, but we removed the duplicate labels and have not had any problem since.

gidesh commented 2 years ago

I've opened a PR https://github.com/prometheus-community/stackdriver_exporter/pull/153 which should fix this. Can someone review it and merge it if possible?

JediNight commented 2 years ago

Still seeing this issue, however not getting a panic in the container logs. It shows up on /metrics page. I tried @jakubbujny suggestion of removing the label and label extractors but that didnt work.

Trying to scrape log-based metric for gke human initiated admin event that is a counter type: protoPayload.methodName=~"google.container.v1.ClusterManager.*" NOT protoPayload.methodName:"get" NOT protoPayload.methodName:"list" protoPayload.authenticationInfo.principalEmail:*