Closed manvitha9347 closed 1 year ago
and also, for the same resourceType, I was not able to access the azure via via resourceGroup scope with /resource end point. any idea why this happens?
- job_name: azure-metrics-example-mongodb
scrape_interval: 1m
scrape_timeout: 1m
metrics_path: /probe/metrics/resource
params:
template:
- 'azuremetricsexp_{metric}_{aggregation}_{unit}'
cache:
- 5s
subscription:
- xxxxxxxxxxxxxxxxxxxxxxxxxxxx
target:
- /subscriptions/xxxxxxxxxxxxxxxxxxxxxxxxxxxx/resourceGroups/oscocosmosxxxxxx
resourceType:
- Microsoft.DocumentDB/databaseAccounts
metric:
- ServerSideLatency
interval: ["PT1M"]
timespan: ["PT1M"]
aggregation:
- average
metricFilter:
- DatabaseName eq 'osco' and CollectionName eq '*'
static_configs:
- targets: ["172.17.0.2:8080"]
what mismatch do you get? can you give me more details?
(added code blocks to your posts)
i need to configure multiple dimensions for multiple metrics in a single job name for example, for ServerSideLatency metric- Dimesions are DataBaseName,CollectionName,ConnectionMode,OperationType NormalisedRUConsumption-Dimensions are DataBaseName,CollectionName,PartitionKey
Everything needs to be configured under single prometheus job name I tried like this:
- job_name: azure-metrics-example-dimensions
scrape_interval: 1m
scrape_timeout: 1m
metrics_path: /probe/metrics/list
params:
template:
- 'azuremetricsexplist_{metric}_{aggregation}_{unit}'
cache:
- 5s
subscription:
- xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
resourceType:
- Microsoft.DocumentDB/databaseAccounts
metric:
- ServerSideLatency
- NormalizedRUConsumption
interval: ["PT1M"]
timespan: ["PT1M"]
aggregation:
- average
metricFilter:
- DatabaseName eq 'mydb' and CollectionName eq '*' and OperationType eq '*' and ConnectionMode eq '*' and PartitionKey eq '*'
static_configs:
- targets: ["172.17.0.2:8080"]
This is not working (getting no data).Also if I try to configure it only for one metric(as mentioned in the previous comment),that is showing wrong data(higher values due to duplicacy)
Is there a way to configure this? @mblaschke
which version are you using?
you should get a warning message in the console/container logs that the metric query wasn't possible:
for ServerSideLatency
Metric: ServerSideLatency does not support requested dimension combination:
databasename,collectionname,operationtype,connectionmode,partitionkey,
supported ones are:
DatabaseName,CollectionName,Region,ConnectionMode,OperationType,PublicAPIType
for NormalizedRUConsumption
Metric: NormalizedRUConsumption does not support requested dimension combination:
databasename,collectionname,operationtype,connectionmode,partitionkey,
supported ones are:
CollectionName,DatabaseName,Region,PartitionKeyRangeId,CollectionRid
as a hint: you can try and execute queries with http://azure-metrics-exporter-url/query
with your browser if you enable --development.webui
(will be always on with next update).
yes,the combinations were not possible as metric doesnot support requested dimension combination to make it work i am using a different job for each metric and each combination(2 metrics, 2 jobs)
if the combination is different,i need to write a new job
for suppose,
for metric:ServerSideLatency, I need 2 combinations
1.DatabaseName eq 'osco' and CollectionName eq ''
2.DatabaseName eq 'osco' and ConnectionMode eq ''
for metric NormalisedRUConsumption I need 2 combinations
1.DatabaseName eq 'osco' and ConnectionMode eq ''
2.DatabaseName eq 'osco' and PartitionKey eq ''
for this combination I need to write 4 different jobs only then data is matching Is there a way to configure in single job/should we need to do using 4 different jobs? @mblaschke
lastest docker image is used
you need at least two jobs because the dimensions are different
suggestion for metric:ServerSideLatency, use
DatabaseName eq 'osco' and CollectionName eq '*' and and ConnectionMode eq '*'
for metric NormalisedRUConsumption use
DatabaseName eq 'osco' and ConnectionMode eq '*' and PartitionKey eq '*'
on Prometheus side you can combine the samples using sum()
and avg()
or other functions.
azure-metrics-exporter itself is just a client to Azure Monitor API and doesn't do any additional transformations. It only fetches the metrics and provides them for Prometheus. So you can transform/combine them with PromQL.
DatabaseName eq 'osco' and CollectionName eq '' and ConnectionMode eq ''
this is giving wrong data actually
if i use it as DatabaseName eq 'osco' and CollectionName eq '' DatabaseName eq 'osco' and and ConnectionMode eq ''
this gives correct data when compared to azure @mblaschke
Are you checking the metrics in Prometheus? And you don't get the combined metrics when you sum()
the averages together so it matches the Metrics in Azure?
Hi @mblaschke i am using the following config of prometheus
job_name: azmetricsexp_ServerSideLatency scrape_interval: 5m scrape_timeout: 5m metrics_path: /probe/metrics/list params: template:
DatabaseName eq 'osco' or DatabaseName eq 'orderff' or DatabaseName eq 'auth' and CollectionName eq '' and ConnectionMode eq '' and OperationType eq '*' static_configs:
this is the error i see: time="2022-06-14T12:33:12+05:30" level=warning msg="insights.MetricsClient#List: Failure responding to request: StatusCode=529 -- Original Error: autorest/azure: Service returned an error. Status=529 Code=\"Unknown\" Message=\"Unknown service error\" Details=[{\"cost\":0,\"interval\":\"PT5M\",\"namespace\":\"Microsoft.DocumentDb/databaseAccounts\",\"resourceregion\":\"westus\",\"timespan\":\"2022-06-14T06:57:12Z/2022-06-14T07:02:12Z\",\"value\":[{\"displayDescription\":\"Server Side Latency\",\"errorCode\":\"Throttled\",\"errorMessage\":\"Query was throttled with reason: ServerBusy. Requested Metric:CosmosDBCustomer|AzureMonitor|ServerSideLatency. Output Dimensions: collectionname,connectionmode,databasename,operationtype. Dimension Filters: . FirstOutputSamplingType: NullableAverage. Start time: 6/14/2022 6:57:12 AM End time: 6/14/2022 7:01:12 AM. Resolution: 00:05:00, Last Value Mode: False.
and also i see a lot of gaps in metrics when viewed in grafana.I tried decreasing the scrape interval to 1m but my jobs are getting down very fastly showing "context deadline exceed" they are working better with 5m scrape which cant be changed.
How can i resolve this??
the following error message is coming from the Azure API, not from the exporter itself. The Azure API is failing here so you might want to approach your Azure support.
Error: autorest/azure: Service returned an error. Status=529 Code="Unknown" Message="Unknown service error" Details=[{"cost":0,"interval":"PT5M","namespace":"Microsoft.DocumentDb/databaseAccounts","resourceregion":"westus","timespan":"2022-06-14T06:57:12Z/2022-06-14T07:02:12Z","value":[{"displayDescription":"Server Side Latency","errorCode":"Throttled","errorMessage":"Query was throttled with reason: ServerBusy. Requested Metric:CosmosDBCustomer|AzureMonitor|ServerSideLatency. Output Dimensions: collectionname,connectionmode,databasename,operationtype. Dimension Filters: . FirstOutputSamplingType: NullableAverage. Start time: 6/14/2022 6:57:12 AM End time: 6/14/2022 7:01:12 AM. Resolution: 00:05:00, Last Value Mode: False.
Something is Azure is broken, it's not the exporter. The exporter cannot fix anything if the Azure API is down or is not responding (the error message itself is also "procuded" from autorest/azure which is the azure-sdk-for-go).
For the gaps:
If, for any reason, the Azure API is failing (see error messge) the exporter cannot do anything and it will produce gaps as the Azure API is not responding. Normally this is not happening often but also the Azure API can be down, for outages check https://status.azure.com/en-us/status
For caching:
If caching is enabled in the exporter (eg. via env var ENABLE_CACHING=1
) you should set:
scrape_interval: 1m
scrape_timeout: 1m
and set cache to the same time as the interval:
cache: ["5m"]
then the exporter will be queried every minute but will deliver the same metric until the cache invalidates (5 minutes).
Hi @mblaschke thankyou for consistent response
For this dimension: DatabaseName eq 'osco' or DatabaseName eq 'orderff' or DatabaseName eq 'auth' and CollectionName eq '' and ConnectionMode eq '' and OperationType eq '*'
I am seeing data for osco but very less data for orderff or auth.I am able to see 1-2 scrapes in 1 hour interval What could be the reason for this? As i am able to see ample amount of data in azure
and can you help me to understand what is the difference between these key value pairs? interval: ["PT5M"] timespan: ["PT5M"] Time interval – The period of time between the gathering of two metric values. Time Span- aggregation span(like if it is 1m ,the aggregation will be one for every 1m and data is sent)
is the understanding correct?
If you use dimensions you get the top N results from the API, see https://docs.microsoft.com/en-us/rest/api/monitor/metrics/list (azure-metrics-exporter is just an Azure Monitor Metrics API client).
If you don't specify metricTop: [10]
, then you get the top 10 results from the Azure Monitor API.
For interval
and timespan
also see https://docs.microsoft.com/en-us/rest/api/monitor/metrics/list
closed due to inactivity
Hi, I am working on config to get metrics from multi dimension from a single resource type my config looks like this
when i use this config, there is a mismatch of data between azure and exporter data is there a way to specify all dimensions(CollectionName,OperationType,ConnectionMode)of a particular metric(Server side latency) in a single job name ? can you help me with this? @mblaschke