tomkerkhove / promitor

Bringing Azure Monitor metrics where you need them.
https://promitor.io
MIT License
248 stars 91 forks source link

System.IndexOutOfRangeException error on scraping Azure Postgres Flexible Server instances #2406

Open kmoppel-cognite opened 8 months ago

kmoppel-cognite commented 8 months ago

Report

After running for a weeks quite often the metric collection stops with below error. Restarting the scraper helps. Resource discovery agent seems to be fine, no errors.

[10:54:01 ERR] Failed to scrape metric azure_postgres_memory_percent for resource files-Flexible.
  System.IndexOutOfRangeException: Index was outside the bounds of the array.
     at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
     at Promitor.Agents.Scraper.Scheduling.ResourcesScrapingJob.ScrapeMetric(ScrapeDefinition`1 scrapeDefinition) in /src/Promitor.Agents.Scraper/ResourcesScrapingJob.cs:line 294

Expected Behavior

No errors, metrics

Actual Behavior

No metrics, errors

Steps to Reproduce the Problem

  1. ...

Component

Scraper

Version

v2.11.0

Configuration

Configuration - we're fetching almost all DB metrics from here https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-monitoring

metrics-declaration.yaml

azureMetadata:
  cloud: Global
  resourceGroupName: promitor
  subscriptionId: xyz
  tenantId: xyz
metricDefaults:
  aggregation:
    interval: "00:05:00"
  scraping:
    schedule: 0 * * ? * *
metrics:
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: UsedCapacity
  description: The average capacity in bytes used in the storage account
  name: azure_storage_account_used_capacity_bytes
  resourceDiscoveryGroups:
  - name: storage-accounts
  resourceType: StorageAccount
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: active_connections
  description: Average active connection used by an Azure Postgre instance
  name: azure_postgres_active_connections
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: backup_storage_used
  description: Average backup storage used in bytes used by an Azure Postgre instance
  name: azure_postgres_backup_storage_used
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Count
    metricName: connections_failed
  description: Average failed active connection used by an Azure Postgre instance
  name: azure_postgres_connections_failed
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: cpu_percent
  description: Average CPU used by an Azure Postgre instance
  name: azure_postgres_cpu_percent
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: memory_percent
  description: Average memory used by an Azure Postgre instance
  name: azure_postgres_memory_percent
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: network_bytes_egress
  description: Average outgoing trafic in bytes used by an Azure Postgre instance
  name: azure_postgres_network_bytes_egress
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: network_bytes_ingress
  description: Average incoming trafic in bytes used by an Azure Postgre instance
  name: azure_postgres_network_bytes_ingress
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Maximum
    metricName: storage_percent
  description: Average storage percent used by an Azure Postgre instance
  name: azure_postgres_storage_percent
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Maximum
    metricName: storage_used
  description: Average storage used in bytes used by an Azure Postgre instance
  name: azure_postgres_storage_used
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: iops
  description: Average IOPS used by an Azure Postgre instance
  name: azure_postgres_iops
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: disk_iops_consumed_percentage
  description: Average IOPS used percentage
  name: azure_postgres_iops_consumed_percentage
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: read_iops
  description: Average Read IOPS used by an Azure Postgre instance
  name: azure_postgres_read_iops
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: read_throughput
  description: Average bytes read per second from disk.
  name: azure_postgres_read_throughput
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Minimum
    metricName: storage_free
  description: Minimum amount of storage space that's available.
  name: azure_postgres_storage_free
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: write_iops
  description: Average Write IOPS used by an Azure Postgre instance
  name: azure_postgres_write_iops
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: txlogs_storage_used
  description: Average WAL files used by an Azure Postgre instance
  name: azure_postgres_txlogs_storage_used
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: write_throughput
  description: Average bytes written to disk per second.
  name: azure_postgres_write_throughput
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
- azureMetricConfiguration:
    aggregation:
      type: Average
    metricName: is_db_alive
  description: Status database alive Azure Postgre instance
  name: azure_postgres_is_db_alive
  resourceDiscoveryGroups:
  - name: postgres-databases
  resourceType: PostgreSql
version: v1

runtime.yaml

server:
  httpPort: "5000"
authentication:
  mode: ServicePrincipal
resourceDiscovery:
  host: "promitor-agent-resource-discovery"
  port: 8889
metricSinks:
  prometheusScrapingEndpoint:
    metricUnavailableValue: "-1"
    enableMetricTimestamps: "false"
    baseUriPath: "/metrics"
    labels:
      transformation: "None"
metricsConfiguration:
  absolutePath: /config/metrics-declaration.yaml
telemetry:
  applicationInsights:
    isEnabled: "false"
  containerLogs:
    isEnabled: "true"
  defaultVerbosity: "information"

Logs

example

Platform

Microsoft Azure

Contact Details

No response

tomkerkhove commented 8 months ago

Thanks for the report, are you willing to contribute a fix?

kmoppel-cognite commented 8 months ago

Too tough / too time costly probably for me...don't command the dotnet ecosystem currently :/