Closed jorgesouzattech closed 1 year ago
please try docker tag 22.12.0-beta4
and set cost scrape time to hour instead of minutes.
Azure is limiting requests to costs/consumption API with a strict ratelimit which is set on tenant and not on subscription level.
22.12.0-beta4
is also implementing a caching mechanism to prevent refetching the values when the exporter is restarting. it will restore the metrics from the cache and adjust the next scrape time to the expiry date of the cached metrics to avoid unwanted queries.
also you can set COSTS_REQUEST_DELAY
to delay every request by eg. 10s or maybe 30s for bigger tenant to relax pressure on this API.
We are using export version 22.11 and increased the request interval to Azure to +5 min which solved the problem and kept the container up and no longer found "panic" errors in our logs.
Thanks for the support.
I recommend fetching the metrics every 12 hours and set the COSTS_REQUEST_DELAY
to 30s
and enable caching.
I will check an Azure StorageAccount solution for cache backend which would make it easier to store data, see #21
Hi Markus. The customer has specific tags to define different environments in its structure (eg: "Prod" and "Dev"). I'm trying to get this information through the azure.resource.tag parameter, but it's not working. How should I configure this parameter to work? The attribut of this tag is "environment" and the values are "Prod", "Dev" and "Backup". I'm setting this parameter like '--azure.resource.tag="environment"' in the run command. It's correct?
Thanks in advance.
I recommend fetching the metrics every 12 hours and set the
COSTS_REQUEST_DELAY
to30s
and enable caching.I will check an Azure StorageAccount solution for cache backend which would make it easier to store data, see #21
How to we enable the cache? After four days working good, the problem reappear. :( These settings do not cause gaps in the Grafana's dashboards, correct?
as mentioned above caching is implemented starting container tag :2022.12.0-beta4
I've just implemented a way to write/restore cache data to Azure StorageAccounts starting with :2022.12.0-beta6
, see #21
if you specify a file path cache data will be stored on local disk so you have to make sure this storage is available after a restart (in Kubernetes you have to attach a Persistent Volume).
Hi Markus. Is it possible to extract metrics from Azure through the exporter such as meter, meter category and meter subcategory? For example: We need to show which is the most expensive artifact in the month.
Thanks in advance
can you show me a screenshot of an example in Azure portal?
have to check how to get the values, thanks.
Ok. I'll appreciate that.
Kind regards.
Hi Markus. Do we've news about this issue?
Kind regards,
still working on it and thinking about how to integrate more complex queries
you could try COSTS_DIMENSION=Meter
to get the Meter values by resourcegroup
I didn't find this "metric".
Hi Markus.
When I put only one metric in the "cost.dimension" parameter (eg --cost.dimension='Meter'), it worked fine. But when I tried to put another metric with space as delimiter, I got the following error:
"Invalid query definition: Invalid dataset grouping: 'Meter MeterCategory ResourceId ConsumedService'; valid values: 'ResourceGroup','ResourceGroupName','ResourceLocation','ConsumedService','ResourceType','ResourceId','MeterId','BillingMonth','MeterCategory','MeterSubcategory','Meter','AccountName','DepartmentName','SubscriptionId','SubscriptionName','ServiceName','ServiceTier','EnrollmentAccountName','BillingAccountId','ResourceGuid','BillingPeriod','InvoiceNumber','ChargeType','PublisherType','ReservationId','ReservationName','Frequency','PartNumber','CostAllocationRuleName','MarkupRuleName','PricingModel','BenefitId','BenefitName',''.
Can you help us?
Thanks in advance.
We achieved this using several "--cost.dimension" parameters, one for each desired metric. Command line got pretty big, but no problem.
Thank you very much.
you can use space separation when using environment variables
for command line you have to specify them multiple times
Next version (available with 23.0.0-beta2
) will have a different approch for cost reporting by offering "query" support:
queries can be defined by using env vars:
COSTS_QUERY_by_resourcegroup="ResourceGroupName"
COSTS_QUERY_by_meter_and_resourcegroup="Meter,ResourceGroupName"
metric name will be azurerm_costs_{queryName}
:
COSTS_QUERY_by_resourcegroup --> azurerm_costs_by_resourcegroup
COSTS_QUERY_by_resourcegroup --> azurerm_costs_by_meter_and_resourcegroup
Hi Markus.
Thank you very much by your dedication about our requests.
We still have a problem with the tags. When using the "COSTS_QUERY_by" metrics, we were unable to get the tag metrics. Only the "owner" tag is displayed in queries. We put these settings in a "vars.env" configuration file and run the exporter with the following command:
"sudo docker run -d --name go1 --env-file vars.env -v go-volume:/go -p 8081:8080 webdevops/azure-resourcemanager-exporter:23.0.0-beta2 --azure.tenant="xxxx-xxxx-xxxx-xxxx" --log.debug --scrape.time=0 --scrape.time.costs=300s --scrape.time.resource=300s --azure.resource.tag=AMBIENTE"
Our last need is to get the cumulative values per tag, and all our problems are solved. :)
Thanks in advance.
Hi Markus.
We still have a problem with the tags. We're using the following command to call the Go's container:
"sudo docker run -d --name go1 --env-file vars.env -v go-volume:/go -p 8081:8080 webdevops/azure-resourcemanager-exporter:23.0.0-beta2 --azure.tenant="xxxx-xxxx-xxxx-xxxx" --log.debug --scrape.time=0 --scrape.time.costs=300s --scrape.time.resource=300s --azure.resource.tag=AMBIENTE"
But, we continue without seeing the tags in resources.
Our last need is to get the cumulative values per tag, and all our problems are solved. :)
Thanks in advance.
Hi Markus. Please, help us with the TAG's issue.
please post the env var which you're using for the cost query (starting with COSTS_QUERY_
)
These vars are in a variable file (vars.env).
AZURE_TENANT_ID=xxxx-xxxx-xxxx-xxxx AZURE_CLIENT_ID=xxxx-xxxx-xxxx-xxxx AZURE_CLIENT_SECRET=xxxx-xxxx-xxxx-xxxx AZURE_SUBSCRIPTION_ID=xxxx-xxxx-xxxx-xxxx AZURE_RESOURCE_TAG=AMBIENTE DEPARTAMENTO COSTS_QUERY_by_MT_RSID_RG=Meter,ResourceId,ResourceGroupName COSTS_QUERY_by_RSID_RG=ResourceId,ResourceGroup COSTS_QUERY_by_MTC_MTSC_RSID_RG=MeterCategory,MeterSubcategory,ResourceId,ResourceGroup
To summarize, you're running a costs query which eg. uses the ResourceGroupName/Resource as dimension you expect that the exporter also adds selected tags from the ResourceGroup AZURE_RESOURCEGROUP_TAG
and/or Resource AZURE_RESOURCE_TAG
config?
In real, I'd like to calculate the costs of resources that have a specific tag. For example: what is the cost of PROD environment resources.
We tried to relate data from azure-resourcemanager-exporter and azure-resourcegraph-exporter (which brings tags in its context), using ResourceId as key, but in one this metric is written differently from the other ("ResourceId" in azure-resourcemanager-exporter and "resourceID" in azure-resourcegraph-exporter) and this causes confusion when we make the relationship.
with :23.2.0-beta1
you can eg try COSTS_QUERY_by_owner=tag:owner
which queries costs by tag owner
from the Azure costs API.
COSTS_QUERY_PRD_owner=tag:owner
I'm receiving the following error:
{"collector":"Costs","file":"/go/src/github.com/webdevops/azure-resourcemanager-exporter/metrics_azurerm_costs.go:205","func":"main.(MetricsCollectorAzureRmCosts).collectSubscription","level":"info","msg":"fetching cost report for query prd_owner","subscriptionID":"xxxx-xxxx-xxxx-xxxx","subscriptionName":"Acesso ao Azure Active Directory(Converted to EA)"} {"collector":"Costs","costreport":"ActualCost","file":"/go/src/github.com/webdevops/azure-resourcemanager-exporter/metrics_azurerm_costs.go:351","func":"main.(MetricsCollectorAzureRmCosts).collectCostManagementMetrics","level":"panic","msg":"POST https://management.azure.com/subscriptions/xxxx-xxxx-xxxx-xxxx/providers/Microsoft.CostManagement/query\n--------------------------------------------------------------------------------\nRESPONSE 400: 400 Bad Request\nERROR CODE: BadRequest\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"BadRequest\",\n \"message\": \"Invalid query definition: Invalid dataset grouping: 'owner'; valid values: 'ResourceGroup','ResourceGroupName','ResourceLocation','ConsumedService','ResourceType','ResourceId','MeterId','BillingMonth','MeterCategory','MeterSubcategory','Meter','AccountName','DepartmentName','SubscriptionId','SubscriptionName','ServiceName','ServiceTier','EnrollmentAccountName','BillingAccountId','ResourceGuid','BillingPeriod','InvoiceNumber','ChargeType','PublisherType','ReservationId','ReservationName','Frequency','PartNumber','CostAllocationRuleName','MarkupRuleName','PricingModel','BenefitId','BenefitName',''.\r\n\r\n (Request ID: e924ace7-6fdc-43e1-8368-49cdebcd26ae)\"\n }\n}\n--------------------------------------------------------------------------------\n","subscriptionID":"10d45ba9-1ad0-40f9-a0d3-e495ad07613a","subscriptionName":"Acesso ao Azure Active Directory(Converted to EA)"} {"collector":"Costs","file":"/go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230211175655-c2cb48b1ad72/prometheus/collector/collector.go:193","func":"collector.(*Collector).collectRun.func1.1","level":"error","msg":"panic occurred (panic threshold 1 of 5): POST https://management.azure.com/subscriptions/xxxx-xxxx-xxxx-xxxx/providers/Microsoft.CostManagement/query\n--------------------------------------------------------------------------------\nRESPONSE 400: 400 Bad Request\nERROR CODE: BadRequest\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"BadRequest\",\n \"message\": \"Invalid query definition: Invalid dataset grouping: 'owner'; valid values: 'ResourceGroup','ResourceGroupName','ResourceLocation','ConsumedService','ResourceType','ResourceId','MeterId','BillingMonth','MeterCategory','MeterSubcategory','Meter','AccountName','DepartmentName','SubscriptionId','SubscriptionName','ServiceName','ServiceTier','EnrollmentAccountName','BillingAccountId','ResourceGuid','BillingPeriod','InvoiceNumber','ChargeType','PublisherType','ReservationId','ReservationName','Frequency','PartNumber','CostAllocationRuleName','MarkupRuleName','PricingModel','BenefitId','BenefitName',''.\r\n\r\n (Request ID: e924ace7-6fdc-43e1-8368-49cdebcd26ae)\"\n }\n}\n--------------------------------------------------------------------------------\n"}
already working on a fix, the constant from the azure-sdk is not correct and the columns are not matching. will be released in some hours
please try :23.2.0-beta2
Hi, Markus.
Thank you very much for helping us. Just a detail: We have values in the tags that are written with an underscore "_", and the query ignored those records. Would it be possible to consider these cases as well?
Thanks in advance.
do you have an example for used tags? feel free to post the json value of the tags from the resource arm definition you're talking about tag values or the tag name?
can you see these tags in the azure cost analysis when you select the group by tags? the exporter is not ignoring these tags and it will take some time to generate a cost reporting with a similar tag setup.
You are right. In the portal these TAGs do not appear too.
are these tags on resources or on resourcegroups?
what i could do:
adding the resourcegroups tags to cost reports when using ResourceGroupName
as grouping.
Yes, we request that the customer apply this method, but we are not sure if this will apply.
how are the tags currently applied? on what scope?
We've determined five mandatory tags and these are being applied to resources.
Hi, Markus.
We are very pleased with his help. What you did worked fine.
But, there's just one small detail I'd like to report to you: currently, we're only able to fetch one tag per metric. Would you be able to bring multiple tags in just one metric line?
For example: I have tag 1 and 2. If I want to fetch these tags, I need to create two metric calls: COSTS_QUERY_by_MT_MTC_RSID_RG_1=Meter,MeterCategory,ResourceId,ResourceGroup,tag:1 COSTS_QUERY_by_MT_MTC_RSID_RG_2=Meter,MeterCategory,ResourceId,ResourceGroup,tag:2 It would not be possible to: COSTS_QUERY_by_MT_MTC_RSID_RG_TAGS=Meter,MeterCategory,ResourceId,ResourceGroup,tag:1,tag:2?
Hi Markus. Do you have news for us about the tags extraction?
Thanks in advance.
@aloysioc the problem here is that this query is sent to the Azure REST API: https://learn.microsoft.com/en-us/rest/api/cost-management/query/usage?tabs=HTTP
So this query is not executed by the exporter but at the backend systems in Azure. Only you would need to address this to the Azure support.
But nevertheless I've added AZURE_RESOURCEGROUP_TAG
for dimension ResourceGroup
and AZURE_RESOURCE_TAG
for dimension ResourceId
.
But that's all i can do here.
@mblaschke What is the latest working image tag
I was using tag azure-resourcemanager-exporter:23.0.0-beta2 and confg
- name: AZURE_RESOURCEGROUP_TAG
value: creator
- name: AZURE_RESOURCE_TAG
value: creator
- name: COSTS_QUERY_by_billingmonth
value: BillingMonth
- name: COSTS_QUERY_by_consumedservice
value: ConsumedService
- name: COSTS_QUERY_by_resourcegroup
value: ResourceGroupName
- name: COSTS_QUERY_by_resourcelocation
value: ResourceLocation
- name: COSTS_QUERY_by_resourcetype
value: ResourceGroupName,ResourceType
- name: COSTS_QUERY_by_resourcetype_resourceid
value: ResourceGroupName,ResourceType,ResourceId
- name: COSTS_QUERY_by_servicename
value: ServiceName
- name: COSTS_QUERY_by_servicename_resourceid
value: ServiceName,ResourceId
- name: COSTS_REQUEST_DELAY
value: 30s
- name: SCRAPE_TIME_COSTS
value: 1h
- name: SCRAPE_TIME_DEFENDER
value: "0"
- name: SCRAPE_TIME_GRAPH
value: "0"
- name: SCRAPE_TIME_IAM
value: "0"
- name: SCRAPE_TIME_QUOTA
value: "0"
- name: SCRAPE_TIME_RESOURCEHEALTH
value: "0"
- name: SCRAPE_TIME_SECURITY
value: "0"
It was working for a few days,but since yesterday I am hitting some request (?) limits.
Today I started using image tag 23.2.0-beta2, configured my pod to use cache through PVC and set COSTS_REQUEST_DELAY to 5m, but I am still hitting the same error:
{
"collector": "Costs",
"file": "/go/src/github.com/webdevops/azure-resourcemanager-exporter/metrics_azurerm_costs.go:218",
"func": "main.(*MetricsCollectorAzureRmCosts).collectSubscription",
"level": "info",
"msg": "fetching cost report for query by_resourcegroup",
"subscriptionID": "*****",
"subscriptionName": "******"
}
{
"collector": "Costs",
"costreport": "ActualCost",
"file": "/go/src/github.com/webdevops/azure-resourcemanager-exporter/metrics_azurerm_costs.go:381",
"func": "main.(*MetricsCollectorAzureRmCosts).collectCostManagementMetrics",
"level": "panic",
"msg": "POST https://management.azure.com/subscriptions/******/providers/Microsoft.CostManagement/query\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: 429\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"429\",\n \"message\": \"Too many requests. Please retry.\"\n }\n}\n--------------------------------------------------------------------------------\n",
"subscriptionID": "***",
"subscriptionName": "****"
}
{
"collector": "Costs",
"file": "/go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230212164333-176c199fce96/prometheus/collector/collector.go:193",
"func": "collector.(*Collector).collectRun.func1.1",
"level": "error",
"msg": "panic occurred (panic threshold 1 of 5): POST https://management.azure.com/subscriptions/******/providers/Microsoft.CostManagement/query\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: 429\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"429\",\n \"message\": \"Too many requests. Please retry.\"\n }\n}\n--------------------------------------------------------------------------------\n"
}
{
"collector": "Costs",
"file": "/go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230212164333-176c199fce96/prometheus/collector/cache.go:160",
"func": "collector.(*Collector).collectionSaveCache",
"level": "info",
"msg": "saved state to cache: file://data/costs.json (expiring 2023-03-28 10:47:35.939969587 +0000 UTC)"
}
{
"collector": "Costs",
"duration": 14.541016868,
"file": "/go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230212164333-176c199fce96/prometheus/collector/collector.go:325",
"func": "collector.(*Collector).collectionFinish",
"level": "info",
"msg": "finished metrics collection, next run in 1h0m0s",
"nextRun": "2023-03-28T10:47:35.944520782Z"
}
All images with 23.3.0-beta wont even start for me.
PS Thank you for your contribution on this project.
@mblaschke What is the latest working image tag
I was using tag azure-resourcemanager-exporter:23.0.0-beta2 and confg
- name: AZURE_RESOURCEGROUP_TAG value: creator - name: AZURE_RESOURCE_TAG value: creator - name: COSTS_QUERY_by_billingmonth value: BillingMonth - name: COSTS_QUERY_by_consumedservice value: ConsumedService - name: COSTS_QUERY_by_resourcegroup value: ResourceGroupName - name: COSTS_QUERY_by_resourcelocation value: ResourceLocation - name: COSTS_QUERY_by_resourcetype value: ResourceGroupName,ResourceType - name: COSTS_QUERY_by_resourcetype_resourceid value: ResourceGroupName,ResourceType,ResourceId - name: COSTS_QUERY_by_servicename value: ServiceName - name: COSTS_QUERY_by_servicename_resourceid value: ServiceName,ResourceId - name: COSTS_REQUEST_DELAY value: 30s - name: SCRAPE_TIME_COSTS value: 1h - name: SCRAPE_TIME_DEFENDER value: "0" - name: SCRAPE_TIME_GRAPH value: "0" - name: SCRAPE_TIME_IAM value: "0" - name: SCRAPE_TIME_QUOTA value: "0" - name: SCRAPE_TIME_RESOURCEHEALTH value: "0" - name: SCRAPE_TIME_SECURITY value: "0"
It was working for a few days,but since yesterday I am hitting some request (?) limits.
Today I started using image tag 23.2.0-beta2, configured my pod to use cache through PVC and set COSTS_REQUEST_DELAY to 5m, but I am still hitting the same error:
{ "collector": "Costs", "file": "/go/src/github.com/webdevops/azure-resourcemanager-exporter/metrics_azurerm_costs.go:218", "func": "main.(*MetricsCollectorAzureRmCosts).collectSubscription", "level": "info", "msg": "fetching cost report for query by_resourcegroup", "subscriptionID": "*****", "subscriptionName": "******" } { "collector": "Costs", "costreport": "ActualCost", "file": "/go/src/github.com/webdevops/azure-resourcemanager-exporter/metrics_azurerm_costs.go:381", "func": "main.(*MetricsCollectorAzureRmCosts).collectCostManagementMetrics", "level": "panic", "msg": "POST https://management.azure.com/subscriptions/******/providers/Microsoft.CostManagement/query\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: 429\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"429\",\n \"message\": \"Too many requests. Please retry.\"\n }\n}\n--------------------------------------------------------------------------------\n", "subscriptionID": "***", "subscriptionName": "****" } { "collector": "Costs", "file": "/go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230212164333-176c199fce96/prometheus/collector/collector.go:193", "func": "collector.(*Collector).collectRun.func1.1", "level": "error", "msg": "panic occurred (panic threshold 1 of 5): POST https://management.azure.com/subscriptions/******/providers/Microsoft.CostManagement/query\n--------------------------------------------------------------------------------\nRESPONSE 429: 429 Too Many Requests\nERROR CODE: 429\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"429\",\n \"message\": \"Too many requests. Please retry.\"\n }\n}\n--------------------------------------------------------------------------------\n" } { "collector": "Costs", "file": "/go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230212164333-176c199fce96/prometheus/collector/cache.go:160", "func": "collector.(*Collector).collectionSaveCache", "level": "info", "msg": "saved state to cache: file://data/costs.json (expiring 2023-03-28 10:47:35.939969587 +0000 UTC)" } { "collector": "Costs", "duration": 14.541016868, "file": "/go/pkg/mod/github.com/webdevops/go-common@v0.0.0-20230212164333-176c199fce96/prometheus/collector/collector.go:325", "func": "collector.(*Collector).collectionFinish", "level": "info", "msg": "finished metrics collection, next run in 1h0m0s", "nextRun": "2023-03-28T10:47:35.944520782Z" }
All images with 23.3.0-beta wont even start for me.
PS Thank you for your contribution on this project.
Hi Theok-nice and Markus,
Thank you for your contribution. Since two days that the error 429 is occurring whit high frequence. Does anyone know if Microsoft change the data extraction rules of the Azure portal? This problem begning after March 25th. I've tried reducing the number of metrics we're pulling per exporter, but the problem persists.
Thanks in advance,
Look the behaviour of the data extraction after March 25th:
Costs/consumption ratelimits are AzureAD tenant limits. The more requests you do the higher is the possibility to hit the tenant wide limit.
Set the scrape time of costs to at least 12h or 24h, you might not need hourly cost reporting here if you're in a big AzureAD instance. Also increasing COSTS_REQUEST_DELAY
is a good idea for big tenants.
This is a Azure limit, it's not easy for the exporter to tackle the situation without consuming more and more requests and blocking all other applications as well. This also includes every view/requests on the cost analysis dashboard in Azure portal which "consume" requests.
So, but I had no changes to my config file. The number of metrics defined by us is the same. Now, I tried reduction this metrics to solved this problem, but I'm not succeeding. :(
If you're right before exceeding the rate limit it could mean that you're going to hit it Azure tenant wide if someone uses the cost analysis module in Azure portal or uses Powershell to do some cost analysis. Azure doesn't care if someone does it in a development subscriptions as these limits are tenant wide.
Also requests have a cost that also can affect how early you will hit the rate limit.
maybe as idea: check the Azure/AzureAD logs who is consuming the requests and if it is the exporter reduce the scrape time for example.
I am just thinking:
I changed my config to have only one request
COSTS_REQUEST_DELAY: "10m"
COSTS_QUERY_by_billingmonth: "BillingMonth"
Then I delete my PVC and the pod. When the exporter starts, I get 429 instantly (within a few seconds).
If this fails for the simplest query without even waiting the 10mins that COSTS_REQUEST_DELAY sets, I don't think that scrape time is the issue.
To my knowledge, my tests with the exporter is the only process that query cost API for the subscription. It doesn't make sense to hit any rate limit.
Hello everyone,
We're using your Github project "webdevops/azure-resourcemanger-exporte" to collect costs data from Azure Cost Management. In the past, we already used the old version of this project, where the "Go" is not compiled like a Docker container. Currently, with the "Go" as a container, we're receiving the follow error, especific in the cost data collect: "RESPONSE 429: 429 Too Many Requests\nERROR CODE: 429". We already increase the duration time that this data are collect to 5 minutes, but the problem persist. The container stoped to drop, but the "panic" in the collect still occurr, and when happen, create a gap in our dashboards. We'd like that you help us with this issue. Can we count on your help?