tomkerkhove / promitor

Bringing Azure Monitor metrics where you need them.
https://promitor.io
MIT License
249 stars 91 forks source link

Service Bus Aggregation missing one interval #1563

Open mtarantino opened 3 years ago

mtarantino commented 3 years ago

Report

Using the latest docker image, I'm scrapping Service Bus metrics and counting messages in a Queue/Topic. The aggregation is not properly computed according to my expectation. Details: I'm using metricDefaults.aggregation.interval of 00:05:00 and metricDefaults.scraping.schedule of 0 * * ? * * .

Expected Behavior

My queue has a constant sized of 4 messages in the queue. I expected the aggregation to be for:

Actual Behavior

My queue has a constant sized of 4 messages in the queue. I see the following aggregation:

Steps to Reproduce the Problem

  1. Publish 4 messages in an Azure Service Bus queue (without consuming them)
  2. Configure promitor to scrap metrics from the queue
  3. Run the docker image
    docker run -p 8080:8080 \
    --env-file az-mon-auth.creds \
    --volume /mypath/metrics-declaration.yaml:/config/metrics-declaration.yaml \
    --volume /mypath/runtime.yaml:/config/runtime.yaml \
    ghcr.io/tomkerkhove/promitor-agent-scraper:2.1.1

Component

Scraper

Version

2.1.1

Configuration

runtime.yaml

---
server:
  httpPort: 8080 # Optional. Default: 80
metricSinks:
  prometheusScrapingEndpoint:
    metricUnavailableValue: NaN # Optional. Default: NaN
    enableMetricTimestamps: false # Optional. Default: true
    baseUriPath: /metrics # Optional. Default: /metrics
metricsConfiguration:
  absolutePath: /config/metrics-declaration.yaml # Optional. Default: /config/metrics-declaration.yaml
telemetry:
  containerLogs:
    isEnabled: true # Optional. Default: true
    verbosity: trace # Optional. Default: N/A
  defaultVerbosity: Trace # Optional. Default: error
azureMonitor:
  logging:
    informationLevel: Basic # Optional. Default: Basic
    isEnabled: true # Optional. Default: false

metrics-declaration.yaml

metricDefaults:
  aggregation:
    interval: 00:05:00
  scraping:
    schedule: "0 * * * * *"
metrics:
  - name: azure_service_bus_messages
    description: "Count of messages in a Queue/Topic."
    resourceType: ServiceBusNamespace
    labels:
      env: dev
    azureMetricConfiguration:
      metricName: Messages
      aggregation:
        type: Average
    resources:
      - namespace: myNamespace
        topicName: myTopic

Logs

Unfortunately I despite enabling azureMonitor in my runtime configuration I was not able to get the Azure Monitor API call logged (not sure why...) For Average:

[11:40:00 INF] Scraping Azure Monitor - 03/17/2021 11:40:00 +00:00
[11:40:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:40:00 INF] Adding Prometheus sink to expose on /metrics
[11:40:00 DBG] Failed to locate the development https certificate at 'null'.
[11:40:00 INF] Now listening on: http://[::]:8080
[11:40:00 INF] Application started. Press Ctrl+C to shut down.
[11:40:00 INF] Hosting environment: Production
[11:40:00 INF] Content root path: /app
[11:40:02 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:41:00 INF] Scraping Azure Monitor - 03/17/2021 11:41:00 +00:00
[11:41:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:41:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:42:00 INF] Scraping Azure Monitor - 03/17/2021 11:42:00 +00:00
[11:42:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:42:00 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:43:00 INF] Scraping Azure Monitor - 03/17/2021 11:43:00 +00:00
[11:43:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:43:00 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:44:00 INF] Scraping Azure Monitor - 03/17/2021 11:44:00 +00:00
[11:44:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:44:00 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:45:00 INF] Scraping Azure Monitor - 03/17/2021 11:45:00 +00:00
[11:45:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:45:00 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:46:00 INF] Scraping Azure Monitor - 03/17/2021 11:46:00 +00:00
[11:46:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:46:00 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:47:00 INF] Scraping Azure Monitor - 03/17/2021 11:47:00 +00:00
[11:47:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:47:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:48:00 INF] Scraping Azure Monitor - 03/17/2021 11:48:00 +00:00
[11:48:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:48:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00

For Total

[11:16:05 INF] Scraping Azure Monitor - 03/17/2021 11:16:05 +00:00
[11:16:05 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:16:05 INF] Adding Prometheus sink to expose on /metrics
[11:16:05 DBG] Failed to locate the development https certificate at 'null'.
[11:16:05 INF] Now listening on: http://[::]:8080
[11:16:05 INF] Application started. Press Ctrl+C to shut down.
[11:16:05 INF] Hosting environment: Production
[11:16:05 INF] Content root path: /app
[11:16:06 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:17:00 INF] Scraping Azure Monitor - 03/17/2021 11:17:00 +00:00
[11:17:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:17:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:18:00 INF] Scraping Azure Monitor - 03/17/2021 11:18:00 +00:00
[11:18:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:18:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:19:00 INF] Scraping Azure Monitor - 03/17/2021 11:19:00 +00:00
[11:19:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:19:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:20:00 INF] Scraping Azure Monitor - 03/17/2021 11:20:00 +00:00
[11:20:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:20:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:21:00 INF] Scraping Azure Monitor - 03/17/2021 11:21:00 +00:00
[11:21:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:21:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:22:00 INF] Scraping Azure Monitor - 03/17/2021 11:22:00 +00:00
[11:22:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:22:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:23:00 INF] Scraping Azure Monitor - 03/17/2021 11:23:00 +00:00
[11:23:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:23:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:24:00 INF] Scraping Azure Monitor - 03/17/2021 11:24:00 +00:00
[11:24:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:24:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:25:00 INF] Scraping Azure Monitor - 03/17/2021 11:25:00 +00:00
[11:25:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:25:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:26:00 INF] Scraping Azure Monitor - 03/17/2021 11:26:00 +00:00
[11:26:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:26:01 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:27:00 INF] Scraping Azure Monitor - 03/17/2021 11:27:00 +00:00
[11:27:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:27:01 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:28:00 INF] Scraping Azure Monitor - 03/17/2021 11:28:00 +00:00
[11:28:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:28:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00
[11:29:00 INF] Scraping Azure Monitor - 03/17/2021 11:29:00 +00:00
[11:29:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[11:29:00 INF] Found value 16 for metric azure_service_bus_messages with aggregation interval 00:05:00

Platform

Other

Contact Details

tarantino.dev@gmail.com

tomkerkhove commented 3 years ago

Thanks for reporting. Is it possible to add screenshot of the metrics explorer in the Azure portal for these metrics please?

mtarantino commented 3 years ago

@tomkerkhove, attached a screenshot of the metrics explorer on Azure portal: promitor-1563-azure-portal-metrics

Another example with the following metricDefaults:

metricDefaults:
  aggregation:
    interval: 00:05:00
  scraping:
    schedule: "0 * * ? * *"

Logs:

[13:28:57 INF] Scraping Azure Monitor - 03/17/2021 13:28:57 +00:00
[13:28:57 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[13:28:57 INF] Adding Prometheus sink to expose on /metrics
[13:28:57 DBG] Failed to locate the development https certificate at 'null'.
[13:28:57 INF] Now listening on: http://[::]:8080
[13:28:57 INF] Application started. Press Ctrl+C to shut down.
[13:28:57 INF] Hosting environment: Production
[13:28:57 INF] Content root path: /app
[13:29:00 INF] Found value 4 for metric azure_service_bus_messages with aggregation interval 00:05:00
[13:29:00 INF] Scraping Azure Monitor - 03/17/2021 13:29:00 +00:00
[13:29:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[13:29:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[13:30:00 INF] Scraping Azure Monitor - 03/17/2021 13:30:00 +00:00
[13:30:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[13:30:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[13:31:00 INF] Scraping Azure Monitor - 03/17/2021 13:31:00 +00:00
[13:31:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[13:31:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[13:32:00 INF] Scraping Azure Monitor - 03/17/2021 13:32:00 +00:00
[13:32:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[13:32:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[13:33:00 INF] Scraping Azure Monitor - 03/17/2021 13:33:00 +00:00
[13:33:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[13:33:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[13:34:00 INF] Scraping Azure Monitor - 03/17/2021 13:34:00 +00:00
[13:34:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[13:34:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00
[13:35:00 INF] Scraping Azure Monitor - 03/17/2021 13:35:00 +00:00
[13:35:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[13:35:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00

As you can see:

[13:28:57 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
...
[13:29:00 INF] Found value 4 for metric azure_service_bus_messages with aggregation interval 00:05:00
[13:29:00 INF] Scraping Azure Monitor - 03/17/2021 13:29:00 +00:00
[13:29:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[13:29:01 INF] Found value 3.2 for metric azure_service_bus_messages with aggregation interval 00:05:00

First scrapping is returning the proper value 4 and then it's returning 3.2.

I though that it might be cause the 2 api calls are too close, but when trying with a fixed time in the schedule (so the time between the first scrap which I guess is part of a kind of initialisation and the next one is greater than 3 minutes, the results is the same)

tomkerkhove commented 3 years ago

Hm odd, is the metric explorer also using 00:05:00 as aggregation?

Frankly, we can only scrape the API with the same configuration but there is no data manipulation that we are doing.

mtarantino commented 3 years ago

Hm odd, is the metric explorer also using 00:05:00 as aggregation?

This is the metrics configuration:

metricDefaults:
  aggregation:
    interval: 00:05:00
  scraping:
    schedule: "0 * * ? * *"
metrics:
  - name: azure_service_bus_messages
    description: "Count of messages in a Queue/Topic."
    resourceType: ServiceBusNamespace
    labels:
      env: dev
    azureMetricConfiguration:
      metricName: Messages
      aggregation:
        type: Average
    resources:
      - namespace: myNamespace
        topicName: myTopic

I would need to dig into the api call, I saw in runtime/scraper#azure-monitor that's it's possible to get the Azure Monitor API call but didn't manage to make it work with my configuration, is there something wrong with the following config:

server:
  httpPort: 8080 # Optional. Default: 80
metricSinks:
  prometheusScrapingEndpoint:
    metricUnavailableValue: NaN # Optional. Default: NaN
    enableMetricTimestamps: false # Optional. Default: true
    baseUriPath: /metrics # Optional. Default: /metrics
metricsConfiguration:
  absolutePath: /config/metrics-declaration.yaml # Optional. Default: /config/metrics-declaration.yaml
telemetry:
  containerLogs:
    isEnabled: true # Optional. Default: true
    verbosity: trace # Optional. Default: N/A
  defaultVerbosity: Trace # Optional. Default: error
azureMonitor:
  logging:
    informationLevel: BodyAndHeaders # Optional. Default: Basic
    isEnabled: true # Optional. Default: false
tomkerkhove commented 3 years ago

I was more referring to the aggregation in te metric explorer of the Azure Portal, can you check please?

mtarantino commented 3 years ago

I was more referring to the aggregation in the metric explorer of the Azure Portal, can you check please?

@tomkerkhove : Ah, yes the aggregation in the Azure Portal Metric explorer (called "time granularity" on the interface) is set to 5 minutes.

I've done some additional test and let promitor runs over night with the metrics configuration

metricDefaults:
  aggregation:
    interval: 00:05:00
  scraping:
    schedule: "0 * * * * *"
metrics:
  - name: azure_service_bus_messages
    description: "Count of messages in a Queue/Topic."
    resourceType: ServiceBusNamespace
    labels:
      env: dev
    azureMetricConfiguration:
      metricName: Messages
      aggregation:
        type: Total
    resources:
      - namespace: myNamespace
        topicName: myTopic

This is the Metric Explorer screenshot: promitor-1563-azure-portal-metrics-20210318

And this are the logs, we can see that the total is oscillating between 20 and 25 despite the fact that the amount of messages is fixed to 5 (the proper total should be 25):


[07:20:00 INF] Scraping Azure Monitor - 03/18/2021 07:20:00 +00:00
[07:20:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:20:00 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:21:00 INF] Scraping Azure Monitor - 03/18/2021 07:21:00 +00:00
[07:21:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:21:00 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:22:00 INF] Scraping Azure Monitor - 03/18/2021 07:22:00 +00:00
[07:22:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:22:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:23:00 INF] Scraping Azure Monitor - 03/18/2021 07:23:00 +00:00
[07:23:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:23:01 INF] Found value 25 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:24:00 INF] Scraping Azure Monitor - 03/18/2021 07:24:00 +00:00
[07:24:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:24:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:25:00 INF] Scraping Azure Monitor - 03/18/2021 07:25:00 +00:00
[07:25:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:25:00 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:26:00 INF] Scraping Azure Monitor - 03/18/2021 07:26:00 +00:00
[07:26:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:26:00 INF] Found value 25 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:27:00 INF] Scraping Azure Monitor - 03/18/2021 07:27:00 +00:00
[07:27:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:27:01 INF] Found value 25 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:28:00 INF] Scraping Azure Monitor - 03/18/2021 07:28:00 +00:00
[07:28:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:28:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:29:00 INF] Scraping Azure Monitor - 03/18/2021 07:29:00 +00:00
[07:29:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:29:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:30:00 INF] Scraping Azure Monitor - 03/18/2021 07:30:00 +00:00
[07:30:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:30:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:31:00 INF] Scraping Azure Monitor - 03/18/2021 07:31:00 +00:00
[07:31:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:31:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:32:00 INF] Scraping Azure Monitor - 03/18/2021 07:32:00 +00:00
[07:32:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:32:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:33:00 INF] Scraping Azure Monitor - 03/18/2021 07:33:00 +00:00
[07:33:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:33:01 INF] Found value 25 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:34:00 INF] Scraping Azure Monitor - 03/18/2021 07:34:00 +00:00
[07:34:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:34:01 INF] Found value 25 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:35:00 INF] Scraping Azure Monitor - 03/18/2021 07:35:00 +00:00
[07:35:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:35:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:36:00 INF] Scraping Azure Monitor - 03/18/2021 07:36:00 +00:00
[07:36:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:36:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:37:00 INF] Scraping Azure Monitor - 03/18/2021 07:37:00 +00:00
[07:37:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:37:00 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:38:00 INF] Scraping Azure Monitor - 03/18/2021 07:38:00 +00:00
[07:38:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:38:00 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:39:00 INF] Scraping Azure Monitor - 03/18/2021 07:39:00 +00:00
[07:39:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:39:00 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:40:00 INF] Scraping Azure Monitor - 03/18/2021 07:40:00 +00:00
[07:40:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:40:00 INF] Found value 25 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:41:00 INF] Scraping Azure Monitor - 03/18/2021 07:41:00 +00:00
[07:41:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:41:01 INF] Found value 25 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:42:00 INF] Scraping Azure Monitor - 03/18/2021 07:42:00 +00:00
[07:42:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:42:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
[07:43:00 INF] Scraping Azure Monitor - 03/18/2021 07:43:00 +00:00
[07:43:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[07:43:01 INF] Found value 20 for metric azure_service_bus_messages with aggregation interval 00:05:00
´´´
tomkerkhove commented 3 years ago

That is true, but the metrics explorer is using average, while you are scraping totals?

Next to that, can you compare it without filter on the subscription? Maybe that's causing an issue that I need to fix.

mtarantino commented 3 years ago

That is true, but the metrics explorer is using average, while you are scraping totals?

@tomkerkhove: The issue is the same with average but I wanted to try different aggregation type (on the explorer only average is available whereas you can retrieve Maximum, Total and Minimum from promitor).

As I mentioned earlier when trying the other Aggregation the results are also wrong e.g. with 4 messages in the queue:

Next to that, can you compare it without filter on the subscription? Maybe that's causing an issue that I need to fix.

I guess you mean something like removing the filter on the topic/queue like

metrics:
  - name: azure_service_bus_messages
    description: "Count of messages in a Queue/Topic."
    resourceType: ServiceBusNamespace
    labels:
      env: dev
    azureMetricConfiguration:
      metricName: Messages
      dimenstion:
        name: EntityName
      aggregation:
        type: Average
    resources:
      - namespace: myNamespace

Issue is the same (I have 5 messages in the topic)

08:04:35 INF] Scraping Azure Monitor - 03/18/2021 08:04:35 +00:00
[08:04:35 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[08:04:35 INF] Adding Prometheus sink to expose on /metrics
[08:04:35 DBG] Failed to locate the development https certificate at 'null'.
[08:04:35 INF] Now listening on: http://[::]:8080
[08:04:35 INF] Application started. Press Ctrl+C to shut down.
[08:04:35 INF] Hosting environment: Production
[08:04:35 INF] Content root path: /app
...
[08:04:39 INF] Found value 5 for metric azure_service_bus_messages with dimension myTopic as part of entity_name dimension with aggregation interval 00:05:00
[08:05:00 INF] Scraping Azure Monitor - 03/18/2021 08:05:00 +00:00
[08:05:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
...
[08:05:03 INF] Found value 4 for metric azure_service_bus_messages with dimension myTopic as part of entity_name dimension with aggregation interval 00:05:00
[08:06:00 INF] Scraping Azure Monitor - 03/18/2021 08:06:00 +00:00
[08:06:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
...
[08:06:02 INF] Found value 4 for metric azure_service_bus_messages with dimension myTopic as part of entity_name dimension with aggregation interval 00:05:00
[08:07:00 INF] Scraping Azure Monitor - 03/18/2021 08:07:00 +00:00
[08:07:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
...
[08:07:02 INF] Found value 4 for metric azure_service_bus_messages with dimension myTopic as part of entity_name dimension with aggregation interval 00:05:00
[08:08:00 INF] Scraping Azure Monitor - 03/18/2021 08:08:00 +00:00
[08:08:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
...
[08:08:02 INF] Found value 4 for metric azure_service_bus_messages with dimension myTopic as part of entity_name dimension with aggregation interval 00:05:00
[08:09:00 INF] Scraping Azure Monitor - 03/18/2021 08:09:00 +00:00
[08:09:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
...
[08:09:01 INF] Found value 5 for metric azure_service_bus_messages with dimension myTopic as part of entity_name dimension with aggregation interval 00:05:00

It would be interesting to have the api call the azure monitor, this should be the configuration which allows to print to stdout the calls right:

server:
  httpPort: 8080 # Optional. Default: 80
metricSinks:
  prometheusScrapingEndpoint:
    metricUnavailableValue: NaN # Optional. Default: NaN
    enableMetricTimestamps: false # Optional. Default: true
    baseUriPath: /metrics # Optional. Default: /metrics
metricsConfiguration:
  absolutePath: /config/metrics-declaration.yaml # Optional. Default: /config/metrics-declaration.yaml
telemetry:
  containerLogs:
    isEnabled: true # Optional. Default: true
    verbosity: trace # Optional. Default: N/A
  defaultVerbosity: Trace # Optional. Default: error
azureMonitor:
  logging:
    informationLevel: BodyAndHeaders # Optional. Default: Basic
    isEnabled: true # Optional. Default: false
mtarantino commented 3 years ago

I also notice for the other queue and topic in my namespace, all values are randomly oscillating at the same time between 2 states:

I didn't find any pattern in the oscillation it's random:

state1
state2
state1
state1
state1
state2
state1
state2
state1
state1
tomkerkhove commented 3 years ago

I'll have to take a look, sorry!

Just FYI, you should only rely on Average as that's the only supported aggregation: https://docs.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-supported#microsoftservicebusnamespaces

mtarantino commented 3 years ago

Just FYI, you should only rely on Average as that's the only supported aggregation: https://docs.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-supported#microsoftservicebusnamespaces

Sure, this is what I initially did, but as the values were not what I expected I tested the other Aggregation to understand what could be wrong.

I'll have to take a look, sorry!

Thanks :)

mtarantino commented 3 years ago

I did a little bit more investigation on the Azure Monitor API Level and did some test using (https://docs.microsoft.com/en-us/rest/api/monitor/metrics/list#code-try-0).

I'm assuming the Aggregation Interval time you're using is within the following format

Those are the parameters I used:

If I change the timespan: 2021-03-17T11:05:00.000Z/2021-03-17T11:09:59.000Z, I only received 4 results.

Thus I was wondering if the issue could be related to the AzureMonitorClient#GetClosestAggregationInterval() which is sometime providing a wrong Closest Aggregation time and thus instead of having 5 results there are only 4 or something like that.

I also performed a test using the following configuration:

metricDefaults:
  aggregation:
    interval: 00:01:00
  scraping:
    schedule: "0 * * * * *"

The results are also showing unexpected metrics:

16:12:45 INF] Content root path: /app
[16:12:49 INF] Found value 8 for metric azure_service_bus_messages with aggregation interval 00:01:00
[16:13:00 INF] Scraping Azure Monitor - 03/18/2021 16:13:00 +00:00
[16:13:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[16:13:01 INF] Found value 0 for metric azure_service_bus_messages with aggregation interval 00:01:00
[16:14:00 INF] Scraping Azure Monitor - 03/18/2021 16:14:00 +00:00
[16:14:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[16:14:01 INF] Found value 0 for metric azure_service_bus_messages with aggregation interval 00:01:00
[16:15:00 INF] Scraping Azure Monitor - 03/18/2021 16:15:00 +00:00
[16:15:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[16:15:02 INF] Found value 0 for metric azure_service_bus_messages with aggregation interval 00:01:00
[16:16:00 INF] Scraping Azure Monitor - 03/18/2021 16:16:00 +00:00
[16:16:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[16:16:03 INF] Found value 0 for metric azure_service_bus_messages with aggregation interval 00:01:00
[16:17:00 INF] Scraping Azure Monitor - 03/18/2021 16:17:00 +00:00
[16:17:00 INF] Scraping azure_service_bus_messages for resource type ServiceBusNamespace
[16:17:02 INF] Found value 0 for metric azure_service_bus_messages with aggregation interval 00:01:00
tomkerkhove commented 3 years ago

Thanks for digging into this!

Thus I was wondering if the issue could be related to the AzureMonitorClient#GetClosestAggregationInterval() which is sometime providing a wrong Closest Aggregation time and thus instead of having 5 results there are only 4 or something like that.

Can you talk a bit more about that please?

You're suggesting that the issue might be in https://github.com/tomkerkhove/promitor/blob/master/src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs#L122?

mtarantino commented 3 years ago

I notice that for an aggregation interval of 5min with a scraping of 1min the following behaviour for a queue with constant number of messages (e.g. in our case let's take 4).

When querying for a metric with an aggregation interval of 5min, I expect promitor to send for example the following timespan AggregationInterval 2021-03-17T11:05:00.000Z/2021-03-17T11:10:00.000Z and received the following data:

{
      "id": "/subscriptions/XXXX/resourceGroups/xxx/providers/Microsoft.ServiceBus/namespaces/xxx/providers/Microsoft.Insights/metrics/Messages",
      "type": "Microsoft.Insights/metrics",
      "name": {
        "value": "Messages",
        "localizedValue": "Count of messages in a Queue/Topic."
      },
      "displayDescription": "Count of messages in a Queue/Topic.",
      "unit": "Count",
      "timeseries": [
        {
          "metadatavalues": [
            {
              "name": {
                "value": "entityname",
                "localizedValue": "entityname"
              },
              "value": "myTopic"
            }
          ],
          "data": [
            {
              "timeStamp": "2021-03-17T11:05:00Z",
              "average": 4
            },
            {
              "timeStamp": "2021-03-17T11:06:00Z",
              "average": 4
            },
            {
              "timeStamp": "2021-03-17T11:07:00Z",
              "average": 4
            },
            {
              "timeStamp": "2021-03-17T11:08:00Z",
              "average": 4
            },
            {
              "timeStamp": "2021-03-17T11:09:00Z",
              "average": 4
            }
          ]
        }
      ],
      "errorCode": "Success"
    }
  ],

To summarise an array T1=[4-4-4-4-4] which is then aggregated according to the aggregation type ( e.g. sum(T1), avg(T1), ,min(T1), max(T1))

My assumption is that somehow it's sending the timespan AggregationInterval 2021-03-17T11:05:00.000Z/2021-03-17T11:09:59.000Z (some a time span < 5 min) and thus because of the time interval being < 5min Azure monitor is only returning 4 values. So the array is going maybe something like T2=[4-4-4-4-null] maybe then null is then replace by 0, I don't know exactly and then all the computation are using this array to calculate the results.

That's why to confirm the behaviour, it would be good to be able to see the API Call in the console using but this is not properly working :( and I'm not an dotnet developper to make it run on my machine and debug it.

azureMonitor:
  logging:
    informationLevel: BodyAndHeaders # Optional. Default: Basic
    isEnabled: true # Optional. Default: false
mtarantino commented 3 years ago

@tomkerkhove I've setup the project to debug in deep the application, and I guess I found the issue. The problem lies in the last interval retrieved on the Azure Monitoring call. Azure Monitoring is using the best effort to compute the last minute. As you can see in the diagram below the last minute is dropping to 0 because Azure Monitor didn't have the time to compute the last interval: promitor-1563-azure-portal-metrics-20210319

Looking at the issue on internet, I found this page azure-monitor/metrics-troubleshoot#chart-shows-unexpected-drop-in-values which is stating :

Depending on the service, the latency of processing metrics can be within a couple minutes range.
 For charts showing a recent time range with a 1- or 5- minute granularity, a drop of the value over 
the last few minutes becomes more noticeable:

And this is by design.

As a solution, there are two way:

Both solutions might not work for all metrics as it written in azure-monitor/metrics-troubleshoot#chart-shows-unexpected-drop-in-values, that the latency of processing metrics can be within a couple minutes range. and here we're only giving <1minute.

tomkerkhove commented 3 years ago

I'm still looking into this but have to finish #444 first.

In meantime, I've checked and the Azure Monitor insights works but requires trace verbosity, I've verified it and the docs state this: https://promitor.io/configuration/v2.x/runtime/scraper#azure-monitor

tomkerkhove commented 3 years ago

This relates to https://github.com/tomkerkhove/promitor/issues/1290 as well and would stick with aggregation of 5 minutes due to Azure Monitor but I'll think about this issue where you could configure that you want to ignore a time series (but when should we do that, what if it's really 0?)

tomkerkhove commented 2 years ago

Are you still seeing this?