webdevops / azure-metrics-exporter

Azure Monitor metrics exporter for Prometheus with dimension support, template engine and ServiceDiscovery
MIT License
124 stars 25 forks source link

Add metrics to count how many http request failures is happening towards azure #17

Closed NissesSenap closed 2 years ago

NissesSenap commented 2 years ago

First of all thanks for a great project. One thing that I'm missing is a counter for how many requests that is actually returns a 200/400 etc when talking to Azure.

It would probably be good to add a count for how many requests that you have performed for the last hour, to give your users a feeling on how long it's left until the API request limit: https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling

If this is something that you think is a good idea, I'm willing to create a PR for this.

mblaschke commented 2 years ago

You mean requests from the exporter?

NissesSenap commented 2 years ago

Yes precisely

mblaschke commented 2 years ago

i'm thinking about a solution for all azure exporters something like:

azurerm_apicalls{endpoint="management.azure.com",status="200"} xxx
azurerm_apicalls{endpoint="management.azure.com",status="501"} xxx
azurerm_apicalls{endpoint="management.azure.com",status="401"} xxx

not sure who easy it would be to integrate also latency

mblaschke commented 2 years ago

found a possible solution with HistogramVec:

# HELP azurerm_api_requests AzureRM API requests
# TYPE azurerm_api_requests histogram
azurerm_api_request_bucket{statusCode="200",le="0.01"} 4
azurerm_api_request_bucket{statusCode="200",le="0.025"} 4
azurerm_api_request_bucket{statusCode="200",le="0.05"} 4
azurerm_api_request_bucket{statusCode="200",le="0.1"} 4
azurerm_api_request_bucket{statusCode="200",le="0.25"} 4
azurerm_api_request_bucket{statusCode="200",le="0.5"} 4
azurerm_api_request_bucket{statusCode="200",le="1"} 12
azurerm_api_request_bucket{statusCode="200",le="2.5"} 13
azurerm_api_request_bucket{statusCode="200",le="5"} 13
azurerm_api_request_bucket{statusCode="200",le="10"} 13
azurerm_api_request_bucket{statusCode="200",le="30"} 13
azurerm_api_request_bucket{statusCode="200",le="60"} 13
azurerm_api_request_bucket{statusCode="200",le="+Inf"} 13
azurerm_api_request_sum{statusCode="200"} 7.583664
azurerm_api_request_count{statusCode="200"} 13
NissesSenap commented 2 years ago

Looks like a great option.

mblaschke commented 2 years ago
# HELP azurerm_api_ratelimit AzureRM API ratelimit
# TYPE azurerm_api_ratelimit gauge
azurerm_api_ratelimit{endpoint="management.azure.com",scope="subscription",subscriptionID="xxxx-xxxx-xxxx-xxxx",type="read"} 11998
# HELP azurerm_api_request AzureRM API requests
# TYPE azurerm_api_request histogram
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="0.1"} 0
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="0.25"} 0
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="0.5"} 0
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="1"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="2.5"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="5"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="10"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="30"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="60"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",statusCode="200",le="+Inf"} 2
azurerm_api_request_sum{endpoint="management.azure.com",method="get",statusCode="200"} 1.606617
azurerm_api_request_count{endpoint="management.azure.com",method="get",statusCode="200"} 2
mblaschke commented 2 years ago

implemented with 8e2dd63cbe2418e03a78e516a90f98ff384df57e that should give you good visibility of azure api calls:

# HELP azurerm_api_ratelimit AzureRM API ratelimit
# TYPE azurerm_api_ratelimit gauge
azurerm_api_ratelimit{endpoint="management.azure.com",scope="subscription",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",type="reads"} 11998
# HELP azurerm_api_request AzureRM API requests
# TYPE azurerm_api_request histogram
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="0.1"} 0
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="0.25"} 0
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="0.5"} 0
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="1"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="2.5"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="5"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="10"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="30"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="60"} 2
azurerm_api_request_bucket{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx",le="+Inf"} 2
azurerm_api_request_sum{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"} 1.3975738
azurerm_api_request_count{endpoint="management.azure.com",method="get",routingRegion="francesouth",statusCode="200",subscriptionID="xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"} 2

due to the limited validity azurerm_api_ratelimit metrics are reset once /metrics is accessed

this also fixes duplicate azurerm_ratelimit (replaced by azurerm_api_ratelimit) metrics in prometheus server logs.

it will be available as quay.io/webdevops/azure-metrics-exporter:main in 1 or 2 hours.. the new build uses musl libc (alpine) instead of glibc and strips debug symbols.. so it's smaller

please share feedback :)

mblaschke commented 2 years ago

documentation is available here with the next version: https://github.com/webdevops/azure-metrics-exporter#azuretracing-metrics

is now also controllable via env vars

NissesSenap commented 2 years ago

Sorry for not coming back to you @mblaschke. I haven't had time to update to the latest release so far but I hope to do that during the next week but you never know. Thanks for taking the time to implement this feature.