panic: inconsistent label cardinality:

theok-nice commented 1 year ago

I am testing image with tag "23.4.0-beta0"

For the following query

    costs:
        scrapeTime: 12h

        queries:
          - name: by_resourcetype
            help: Costs by ResourceGroupName and ResourceType
            dimensions: [ResourceGroupName,ResourceType]
            valueField: Cost
            timeFrames: [MonthToDate]

I am getting panic error:

panic: inconsistent label cardinality: expected 8 label values but got 6 in prometheus.Labels{"currency":"usd", "resourceGroup":"", "resourceType":"", "scope":"/subscriptions/xxxxxxxx", "subscriptionID":"xxxxxxxx", "timeframe":"MonthToDate"}

goroutine 107 [running]:
github.com/prometheus/client_golang/prometheus.(*GaugeVec).With(...)
        /go/pkg/mod/github.com/prometheus/client_golang@v1.14.0/prometheus/gauge.go:230
github.com/webdevops/go-common/prometheus.(*MetricList).GaugeSet(0xc000292e50?, 0x0?)
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230406214525-d56c4edd9624/prometheus/metrics_list.go:144 +0x9d
github.com/webdevops/go-common/prometheus/collector.(*Collector).collectRun(0xc00015eb60, 0x1)
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230406214525-d56c4edd9624/prometheus/collector/collector.go:304 +0x345
github.com/webdevops/go-common/prometheus/collector.(*Collector).run(0xc00015eb60)
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230406214525-d56c4edd9624/prometheus/collector/collector.go:223 +0x158
github.com/webdevops/go-common/prometheus/collector.(*Collector).Start.func1()
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230406214525-d56c4edd9624/prometheus/collector/collector.go:174 +0x50
created by github.com/webdevops/go-common/prometheus/collector.(*Collector).Start
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230406214525-d56c4edd9624/prometheus/collector/collector.go:166 +0x1a5

After that my pod goes into CrashLoopBackOff status. Any recommendation for this issue?

mblaschke commented 1 year ago

can you retry with 23.5.0-beta0? do you have set an azure.resourceTags or azure.resourceGroupTags?

theok-nice commented 1 year ago

I am also getting the same behavior using the latest tag 23.5.0-beta0 In my config I do set

    azure:
      resourceTags: [creator, env-creation]
      resourceGroupTags: [creator, env-creation]

and I get the same error message

panic: inconsistent label cardinality: expected 8 label values but got 6 in prometheus.Labels{"currency":"usd", "resourceGroup":"", "resourceType":"", "scope":"/subscriptions/xxxxx", "subscriptionID":"xxxx", "timeframe":"MonthToDate"}

goroutine 86 [running]:
github.com/prometheus/client_golang/prometheus.(*GaugeVec).With(...)
        /go/pkg/mod/github.com/prometheus/client_golang@v1.15.1/prometheus/gauge.go:250
github.com/webdevops/go-common/prometheus.(*MetricList).GaugeSet(0xc00026ebe0?, 0x0?)
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230513212717-8a2d16f8bb01/prometheus/metrics_list.go:144 +0x9d
github.com/webdevops/go-common/prometheus/collector.(*Collector).collectRun(0xc000476620, 0x1)
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230513212717-8a2d16f8bb01/prometheus/collector/collector.go:406 +0x34d
github.com/webdevops/go-common/prometheus/collector.(*Collector).run(0xc000476620)
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230513212717-8a2d16f8bb01/prometheus/collector/collector.go:320 +0x196
github.com/webdevops/go-common/prometheus/collector.(*Collector).Start.func1()
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230513212717-8a2d16f8bb01/prometheus/collector/collector.go:255 +0x325
created by github.com/webdevops/go-common/prometheus/collector.(*Collector).Start
        /go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230513212717-8a2d16f8bb01/prometheus/collector/collector.go:234 +0x1a5

theok-nice commented 1 year ago

For what it worth, the following config works

azure:
    resourceTags: [creator]
    resourceGroupTags: []

But its not very helpful in my case, since cost metrics wont have the label I need.

theok-nice commented 1 year ago

I changed my config and tried again:

I set debug environment variable to true
I set resourceGroupTags: [creator]
and changed my query to include only dimensions: [ResourceGroupName]

This is what I can see in the logs

{
  "level": "debug",
  "caller": "armclient/client.tags.go:186",
  "msg": "unable to fetch tagValue for resourceID \"/subscriptions/xxxx/resourcegroups/test\": resourceGroup \"test\" not found",
  "component": "armClientTagManager"
}

I am checking in UI the tags assinged to the test Resource Group and the creator tag is set and populated.

mblaschke commented 1 year ago

i guess i have an idea about the source of this issue, let me fix it and create a new version

mblaschke commented 1 year ago

try 23.5.0-beta1, this should fix your issue

theok-nice commented 1 year ago

23.5.0-beta1 did not solve the issue:

With the following config

    azure:
       resourceGroupTags: [creator]

    collectors:      
      costs:
        scrapeTime: 12h
        queries:
          - name: by_resourcegroup
            help: Costs by ResourceGroup
            dimensions: [ResourceGroupName]
            valueField: Cost
            timeFrames: [MonthToDate]

The pod will crash after the following info messages in the pod:

Will give the following message for all the RGs that have been deleted

  "level": "debug",
  "caller": "armclient/client.tags.go:207",
  "msg": "unable to fetch tagValue for resourceID \"/subscriptions/xxxx/resourcegroups/yyyyyy\": resourceGroup \"yyyyyy\" not found",
  "component": "armClientTagManager"

And the pod will panic and then crash when query for an existing RG

panic: inconsistent label cardinality: expected 6 label values but got 5 in prometheus.Labels{"currency":"usd", "resourceGroup":"zzzzzzzz", "scope":"/subscriptions/xxxxx/resourceGroups/zzzzzzzz", "subscriptionID":"", "timeframe":"MonthToDate"}

I even tested setting a scope to my cost query, and point to only one RG that I know that do exist and have the tag 'creator'. I get the same panic message in the pod.

mblaschke commented 1 year ago

please try 23.5.0-beta2

theok-nice commented 1 year ago

@mblaschke Thank you for the update. Pod stop crashing under this image tag and I getting cost results for the dimensions: [ResourceGroupName]

Debug message continue contain entries for deleted resources

  "level": "debug",
  "caller": "armclient/client.tags.go:207",
  "msg": "unable to fetch tagValue for resourceID \"/subscriptions/xxxx/resourcegroups/yyyyyy\": resourceGroup \"yyyyyy\" not found",
  "component": "armClientTagManager"

We get billing for deleted resources if they existed within the payment period, but the for some reason the reply we get from azure don't contain the tag anymore. They shouldn’t loose the tag, thats strange

mblaschke commented 1 year ago

Thanks for reporting that the crash is solved :)

The problem with tags and historical data is a general issue which might be too difficult to solve with the exporter. The Azure API doesn't know the ResourceGroup anymore, so how should the exporter get the resource information if Azure doesn't have it anymore?

The only real solution might be to use the tag dimension from the Azure Cost Query Usage API but keep in mind that there are limitations from Azure.

theok-nice commented 1 year ago

I am also thinking there might be a limitation from MS side.

In Azure Portal Cost Analysis page, you can get the accumulated cost (say for the running month) and filter by tags. In there we can see that MS store that tag info for existing and deleted RGs.

It seems to me that MS don't return the tag info for the deleted RG, in the cost API that exporter is using. In that case its only MS can fix this.

mblaschke commented 1 year ago

you can also use tags as dimensions:

    queries:
      - name: by_resourcetype
        help: Costs by ResourceGroupName and ResourceType
        dimensions: [resourcegroupname, "tag:owner"]
        valueField: Cost
        timeFrames: [MonthToDate]

webdevops / azure-resourcemanager-exporter

panic: inconsistent label cardinality: #40