Open acondrat opened 4 years ago
I wonder if this is a side effect of https://github.com/prometheus-community/stackdriver_exporter/pull/50
Having the exact same issue here, testing version 0.6.0 and it is not reproduce-able - so it's probably either 0.7.0 or 0.8.0 (0.7.0 introduce #50)
Can you try building and running with #50 reverted?
Still crash with the same error:
panic: duplicate label names
goroutine 99 [running]: github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...) /Users/omerlh/go/pkg/mod/github.com/prometheus/client_golang@v1.6.0/prometheus/value.go:106 github.com/prometheus-community/stackdriver_exporter/collectors.(TimeSeriesMetrics).newConstMetric(0xc000183a10, 0xc0005c8500, 0x4d, 0x34f92458, 0xed65d7041, 0x0, 0xc000285480, 0x8, 0x8, 0x2, ...) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_metrics.go:138 +0x204 github.com/prometheus-community/stackdriver_exporter/collectors.(TimeSeriesMetrics).completeConstMetrics(0xc000183a10) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_metrics.go:179 +0x1dd github.com/prometheus-community/stackdriver_exporter/collectors.(TimeSeriesMetrics).Complete(0xc000183a10) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_metrics.go:160 +0x2b github.com/prometheus-community/stackdriver_exporter/collectors.(MonitoringCollector).reportTimeSeriesMetrics(0xc000156000, 0xc00011e200, 0xc000140500, 0xc000128480, 0xc00011e200, 0x0) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_collector.go:400 +0x13c4 github.com/prometheus-community/stackdriver_exporter/collectors.(MonitoringCollector).reportMonitoringMetrics.func1.1(0xc000606620, 0xc000156000, 0x34f92458, 0xed65d6f15, 0x0, 0x34f92458, 0xed65d7041, 0x0, 0xc000756540, 0xc000140500, ...) /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_collector.go:253 +0x6d7 created by github.com/prometheus-community/stackdriver_exporter/collectors.(MonitoringCollector).reportMonitoringMetrics.func1 /Users/omerlh/dev/stackdriver_exporter/collectors/monitoring_collector.go:227 +0x2bb
So I guess it's not #50 then, something else with the client_golang upgrade.
Can you include more details? Like the flags you're using with the exporter?
Please find my setup bellow. I was having the duplicates panic issues with logging.googleapis.com/user
. All other prefixes seem fine.
spec:
containers:
- command:
- stackdriver_exporter
env:
- name: STACKDRIVER_EXPORTER_MONITORING_METRICS_TYPE_PREFIXES
value: bigtable.googleapis.com/cluster,loadbalancing.googleapis.com/https/request_count,custom.googleapis.com,logging.googleapis.com/user
- name: STACKDRIVER_EXPORTER_MONITORING_METRICS_INTERVAL
value: 5m
- name: STACKDRIVER_EXPORTER_MONITORING_METRICS_OFFSET
value: 0s
- name: STACKDRIVER_EXPORTER_WEB_LISTEN_ADDRESS
value: :9255
- name: STACKDRIVER_EXPORTER_WEB_TELEMETRY_PATH
value: /metrics
- name: STACKDRIVER_EXPORTER_MAX_RETRIES
value: "0"
- name: STACKDRIVER_EXPORTER_HTTP_TIMEOUT
value: 10s
- name: STACKDRIVER_EXPORTER_MAX_BACKOFF_DURATION
value: 5s
- name: STACKDRIVER_EXPORTER_BACKODFF_JITTER_BASE
value: 1s
- name: STACKDRIVER_EXPORTER_RETRY_STATUSES
value: "503"
image: prometheuscommunity/stackdriver-exporter:v0.7.0
Same issue here with v0.9.1:
level=info ts=2020-06-15T11:40:31.592Z caller=stackdriver_exporter.go:136 msg="Starting stackdriver_exporter" version="(version=0.9.1, branch=HEAD, revision=770b1be3d430ef9768f30a2a5d2e35557e464f3c)"
level=info ts=2020-06-15T11:40:31.592Z caller=stackdriver_exporter.go:137 msg="Build context" build_context="(go=go1.14.4, user=root@faf330a7765b, date=20200602-12:12:58)"
level=info ts=2020-06-15T11:40:31.592Z caller=stackdriver_exporter.go:158 msg="Listening on" address=:9255
panic: duplicate label names
goroutine 9602 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
/app/vendor/github.com/prometheus/client_golang/prometheus/value.go:106
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstMetric(0xc000e37a10, 0xc0004c2460, 0x4d, 0x12059df0, 0xed67954c8, 0x0, 0xc000871b00, 0xe, 0x10, 0x2, ...)
/app/collectors/monitoring_metrics.go:139 +0x204
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeConstMetrics(0xc000e37a10)
/app/collectors/monitoring_metrics.go:180 +0x1dd
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc000e37a10)
/app/collectors/monitoring_metrics.go:161 +0x2b
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc00087b080, 0xc000901a00, 0xc000d2fe00, 0xc00165ef00, 0xc000901a00, 0x0)
/app/collectors/monitoring_collector.go:414 +0x13c4
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc002207920, 0xc00087b080, 0x12059e1f, 0xed679539c, 0x0, 0x12059e1f, 0xed67954c8, 0x0, 0xc000fac720, 0xc000d2fe00, ...)
/app/collectors/monitoring_collector.go:267 +0x6d7
created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1
/app/collectors/monitoring_collector.go:241 +0x3f3
Edit 2020/07/14 I can confirm that the issue is still present in v0.10.0:
goroutine 477 [running]:
github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...)
/app/vendor/github.com/prometheus/client_golang/prometheus/value.go:107
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).newConstMetric(0xc0005cda10, 0xc0010ad7c0, 0x43, 0x1e63a5d8, 0xed69f2568, 0x0, 0xc001fc1580, 0x8, 0x8, 0x2, ...)
/app/collectors/monitoring_metrics.go:139 +0x204
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).completeConstMetrics(0xc0005cda10)
/app/collectors/monitoring_metrics.go:180 +0x1dd
github.com/prometheus-community/stackdriver_exporter/collectors.(*TimeSeriesMetrics).Complete(0xc0005cda10)
/app/collectors/monitoring_metrics.go:161 +0x2b
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportTimeSeriesMetrics(0xc0001e2540, 0xc001c10630, 0xc0005f4500, 0xc0003940c0, 0xc001c10630, 0x0)
/app/collectors/monitoring_collector.go:406 +0x13c4
github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1.1(0xc0004f8f60, 0xc0001e2540, 0x1e63a902, 0xed69f24b4, 0x0, 0x1e63a902, 0xed69f25e0, 0x0, 0xc000720780, 0xc0005f4500, ...)
/app/collectors/monitoring_collector.go:259 +0x6d7
created by github.com/prometheus-community/stackdriver_exporter/collectors.(*MonitoringCollector).reportMonitoringMetrics.func1
/app/collectors/monitoring_collector.go:233 +0x3f3
So I've debugged it as I have the same case.
The root cause of the problem is when you have defined custom metrics based on logs with some extractors and these extractors load the same labels as are injected by default by GCP logging
Example:
You are on GKE. You have custom metric based on logs with extractor from field resource.labels.cluster_name
into label cluster_name
. For custom metrics on GKE cluster_name
is already reported by default by GCP so you will see the duplicated labels what cause the panic.
Workaround: delete your custom extractors which are technically not needed
Edit:
As far as I can see project_id
is also injected by default.
So we had the exact same issue as above having duplicated the project_id
ourselves. But we discovered an other issue with duplicate labels after enabling audit logs for spanner:
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_log_entry_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"instance-east-1" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"production" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"1" > gauge:<value:10527 > timestamp_ms:1612880903770 } was collected before with the same name and label values
* [from Gatherer #2] collected metric "stackdriver_spanner_instance_logging_googleapis_com_byte_count" { label:<name:"instance_config" value:"" > label:<name:"instance_id" value:"instance-east-1" > label:<name:"location" value:"us-east1" > label:<name:"log" value:"cloudaudit.googleapis.com/data_access" > label:<name:"project_id" value:"production" > label:<name:"severity" value:"INFO" > label:<name:"unit" value:"By" > gauge:<value:2.2907337e+07 > timestamp_ms:1612880903770 } was collected before with the same name and label values
I think it'd make sense to make the exporter more robust and only report duplicate labels on the cli and export an error metric instead.
EDIT: Same issue as in: #103
So I've debugged it as I have the same case.
The root cause of the problem is when you have defined custom metrics based on logs with some extractors and these extractors load the same labels as are injected by default by GCP logging
Example: You are on GKE. You have custom metric based on logs with extractor from field
resource.labels.cluster_name
into labelcluster_name
. For custom metrics on GKEcluster_name
is already reported by default by GCP so you will see the duplicated labels what cause the panic.Workaround: delete your custom extractors which are technically not needed
Edit: As far as I can see
project_id
is also injected by default.
We finally solved this by going over all our log-based metrics. Took a while, as we have quite a few, but we removed the duplicate labels and have not had any problem since.
I've opened a PR https://github.com/prometheus-community/stackdriver_exporter/pull/153 which should fix this. Can someone review it and merge it if possible?
Still seeing this issue, however not getting a panic in the container logs. It shows up on /metrics page. I tried @jakubbujny suggestion of removing the label and label extractors but that didnt work.
Trying to scrape log-based metric for gke human initiated admin event that is a counter type: protoPayload.methodName=~"google.container.v1.ClusterManager.*" NOT protoPayload.methodName:"get" NOT protoPayload.methodName:"list" protoPayload.authenticationInfo.principalEmail:*
Looks like the exporter crashes when a metric has duplicate label names.
It seems like something introduced in v0.7.0 as I don't see the same issue in v0.6.0