qonto / prometheus-rds-exporter

Prometheus exporter for AWS RDS
MIT License
69 stars 11 forks source link

application crashing due to invalid RDS tag format #120

Closed ns-jflorez closed 7 months ago

ns-jflorez commented 8 months ago

Describe the bug

The exporter pod is not initializing, the container crashes in this step:

{"time":"2024-02-07T23:19:27.975014236Z","level":"INFO","msg":"get RDS metrics"}

`panic: "tag_aws:cloudformation:logical_id" is not a valid label name for metric "rds_instance_tags"

goroutine 77 [running]: github.com/prometheus/client_golang/prometheus.MustNewConstMetric(...) /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.18.0/prometheus/value.go:129 github.com/qonto/prometheus-rds-exporter/internal/app/exporter.(rdsCollector).Collect(0xc0001c4500, 0xc000289f60?) /home/runner/work/prometheus-rds-exporter/prometheus-rds-exporter/internal/app/exporter/exporter.go:531 +0x2dd1 github.com/prometheus/client_golang/prometheus.(Registry).Gather.func1() /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.18.0/prometheus/registry.go:455 +0x102 created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather in goroutine 75 /home/runner/go/pkg/mod/github.com/prometheus/client_golang@v1.18.0/prometheus/registry.go:466 +0x568`

To Reproduce

Assign tags to RDS instance with this key format aws:cloudformation:*

Expected behavior

Application accepts this tag format or covert it to a an acceptable metric label

Additional context

This cloudformation tags can't be deleted, so this issue is a blocker to use the exporter

vmercierfr commented 7 months ago

Thanks for reporting this issue!

The sanitize function that transform RDS instance tags to Prometheus labels was incorrect and could lead to crash. I'm sorry about that, it was an incorrect implementation of Prometheus label restriction.

I prepared the PR (https://github.com/qonto/prometheus-rds-exporter/pull/135) to fix this issue, it will be merge this week in the upcoming 0.7.0 version.

vmercierfr commented 7 months ago

@ns-jflorez we have just released v0.7.0 which contains the patch for this issue. Can you upgrade your deployment and confirm that it's working now?

ns-jflorez commented 6 months ago

@vmercierfr Thanks for this fix, I just tested and it is working now