Unavailable metrics for supported resources

nc-pnan commented 1 month ago

Report

Hi,

I have Promitor deployed in my azure kubernetes cluster successfully sending metrics to my Prometheus deployment. I am successfully receiving most metrics from a Data Factory and Postgres flexible server. However I have an issue collecting a few metrics from both these applications. Both applications are supported according to the documentation.

For Data Factory I am attempting to scrape the integration runtime CPU and memory, which are available in the associated azure monitor. I have tried several different configurations but I keep getting no metrics, and variations of the same error: "BadRequest: Metric: IntegrationRuntimeCpuPercentage does not support requested dimension combination: name, supported ones are: IntegrationRuntimeName,NodeName, "

For the postgres server I actually receive the metric "cpu_credits_remaining" in Prometheus, but here, the value shows up as unavailable (-1).

I am not sure if I am defining these metrics correctly or if they are even supported. I have not been able to find anything regarding this in the documentation. I hope you can help or let me know if/how I should be able to get these metrics. :)

Thanks for your time.

Expected Behavior

I expected all metric values from supported azure resources to appear in Prometheus. However, these three metrics do not:

IntegrationRuntimeCpuPercentage (ADF)
IntegrationRuntimeAvailableMemory (ADF)
cpu_credits_remaining (PostgreSQL Flexible Server)

Actual Behavior

I keep getting no metrics, and variations of the same error: "BadRequest: Metric: IntegrationRuntimeCpuPercentage does not support requested dimension combination: name, supported ones are: IntegrationRuntimeName,NodeName, "

Steps to Reproduce the Problem

Deploy AKS in Azure. Deploy Promitor and integrate with Prometheus.
Write metric config for Promitor with metrics
Write resource discovery groups config
Open prometheus/promitor to search for the metrics
All my other configured metrics are working, but these two do not show up.

Component

Scraper

Version

Helm chart 2.11.0 (app version 2.10.1)

Configuration

Configuration:

metrics:
    - name: promitor_azure_data_factory_integration_runtime_available_cpu
      description: "The CPU usage (%) of ADF integration runtime"
      resourceType: DataFactory
      azureMetricConfiguration:
        metricName: IntegrationRuntimeCpuPercentage
        aggregation:
          type: Average
          interval: 00:01:00
      resourceDiscoveryGroups:  
      - name: data-factory-landscape
    - name: promitor_azure_data_factory_integration_runtime_available_memory
      description: "The available memory of the integration runtime"
      resourceType: DataFactory
      azureMetricConfiguration:
        metricName: IntegrationRuntimeAvailableMemory
        aggregation:
          type: Average
          interval: 00:01:00
    - name: promitor_azure_postgresql_cpu_credits_remaining
      description: "Returns the amount of CPU credits remaining for burstable CPU resource"
      resourceType: PostgreSql
      azureMetricConfiguration:
        metricName: cpu_credits_remaining
        aggregation:
          type: Average
          interval: 00:01:00

resourceDiscoveryGroups:
  - name: postgres-database-landscape
    type: PostgreSql
  - name: data-factory-landscape
    type: DataFactory
  - name: kubernetes-service-landscape
    type: KubernetesService
  - name: storage-account-landscape
    type: StorageAccount
  - name: container-registry-landscape
    type: ContainerRegistry

Logs

 `"BadRequest: Metric: IntegrationRuntimeCpuPercentage does not support requested dimension combination: name, supported ones are: IntegrationRuntimeName,NodeName, "`

Platform

Microsoft Azure

Contact Details

pnan@netcompany.com

github-actions[bot] commented 1 month ago

Thank you for opening an issue! We rely on the community to maintain Promitor. (Learn more)

Is this something you want to contribute?

nc-pnan commented 1 month ago

I figured out why the cpu_credits_remaining metric was received as -1 or null value. After querying azure directly for the metric, I can see that the last 10 minutes of metrics are reported as null values. Simply changing the aggregation interval to 20 minutes resolved this issue for me.

    - name: promitor_azure_postgresql_cpu_credits_remaining
      description: "Returns the amount of CPU credits remaining for burstable CPU resource"
      resourceType: PostgreSql
      azureMetricConfiguration:
        metricName: cpu_credits_remaining
        aggregation:
          type: Average
          interval: 00:20:00

tomkerkhove commented 1 month ago

Nice find, thanks for reporting!

tomkerkhove / promitor