tomkerkhove / promitor

Bringing Azure Monitor metrics where you need them.
https://promitor.io
MIT License
249 stars 91 forks source link

Azure Cosmos DB collection metrics are null #1621

Open SudhakarNandigam-TomTom opened 3 years ago

SudhakarNandigam-TomTom commented 3 years ago

Report

The document count metric is showing correctly for some collections and null for other collections In Cosmos DB. But Azure metrics are showing correctly for all collections in the portal.

Expected Behavior

The metrics from Promitor and Azure monitor metrics should match.

Actual Behavior

The metrics from Promitor and Azure monitor metrics are not matching. I see null values for document count and Index usage for some collections in Cosmos DB in promitor.

Steps to Reproduce the Problem

  1. Configure metric declaration file with SP and Cosmos DB details
  2. Install the Promitor helm chart on AKS

Component

Scraper

Version

2.1.1

Configuration

Configuration:

version: v1
azureMetadata:
  tenantId: xxxx-xxx-xxxx
  subscriptionId: xxx-xxx-xxx-xxx
  resourceGroupName: promitor
metricDefaults:
  aggregation:
    interval: 00:05:00
  scraping:
    schedule: "* * * * *"
metrics:
- azureMetricConfiguration:
    aggregation:
      type: Total
    dimension:
      name: CollectionName
    metricName: DocumentCount
  description: Total document count reported at 5 minutes granularity CollectionName
  name: azure_cosmos_db_document_count_collection_name
  resourceType: CosmosDb
  resources:
  - dbName: xxx-xxx-xxx-xxx
    resourceGroupName: xxxx-xxxx-xxxx

Logs

Promitor logs:

[09:45:00 INF] Scraping azure_cosmos_db_document_count_collection_name for resource type CosmosDb
[09:45:01 INF] Found value null for metric azure_cosmos_db_document_count_collection_name with dimension no_tile_area as part of CollectionName dimension with aggregation interval 00:05:00
[09:45:01 INF] Found value 18 for metric azure_cosmos_db_document_count_collection_name with dimension tile_metadata as part of CollectionName dimension with aggregation interval 00:05:00
[09:45:01 INF] Found value null for metric azure_cosmos_db_document_count_collection_name with dimension __Empty as part of CollectionName dimension with aggregation interval 00:05:00

Azure monitor metrics: image

Platform

Microsoft Azure

Contact Details

No response

tomkerkhove commented 3 years ago

What is the aggregation that you use in the portal?

tomkerkhove commented 3 years ago

Note that the portal uses "Average", while you have configured "Total" in the Promitor configuration.

SudhakarNandigam-TomTom commented 3 years ago

What is the aggregation that you use in the portal?

5 mins

SudhakarNandigam-TomTom commented 3 years ago

Note that the portal uses "Average", while you have configured "Total" in the Promitor configuration.

As per documentation, its Total but in portal I don't see Total aggregation.

tomkerkhove commented 3 years ago

Just for sake of testing, can you please align them to see if the outcome matches?

SudhakarNandigam-TomTom commented 3 years ago

Just for sake of testing, can you please align them to see if the outcome matches?

I changed the aggregation from Total to Average, but the result is same.

tomkerkhove commented 3 years ago

Hm, odd. I am unable to reproduce this:

# HELP azure_cosmos_db_total_docs Demo cosmos query
# TYPE azure_cosmos_db_total_docs gauge
azure_cosmos_db_total_docs{resource_group="promitor-sources",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",resource_uri="subscriptions/0f9d7fea-99e8-4768-8672-06a28514f77e/resourceGroups/promitor-sources/providers/Microsoft.DocumentDB/databaseAccounts/promitor-cosmos-db",instance_name="promitor-cosmos-db",geo="china",environment="dev"} 4 1620305400570
# HELP promitor_ratelimit_arm Indication how many calls are still available before Azure Resource Manager is going to throttle us.
# TYPE promitor_ratelimit_arm gauge
promitor_ratelimit_arm{tenant_id="c8819874-9e56-4e3f-b1a8-1c0325138f27",subscription_id="0f9d7fea-99e8-4768-8672-06a28514f77e",app_id="ceb249a3-44ce-4c90-8863-6776336f5b7e"} 11950 1620305400535

Config:

version: v1
azureMetadata:
  tenantId: c8819874-9e56-4e3f-b1a8-1c0325138f27
  subscriptionId: 0f9d7fea-99e8-4768-8672-06a28514f77e
  resourceGroupName: promitor
metricDefaults:
  aggregation:
    interval: 00:05:00
  limit: 10
  labels:
    geo: china
    environment: dev
  scraping:
    # Every minute
    schedule: "* * * * *"
metrics:
- name: azure_cosmos_db_total_docs
  description: "Demo cosmos query"
  resourceType: CosmosDb
  azureMetricConfiguration:
    metricName: DocumentCount
    aggregation:
      type: Total
  resources:
  - dbName: promitor-cosmos-db
    resourceGroupName: promitor-sources

Did I miss anything that you have?

tomkerkhove commented 3 years ago

Based on the docs I've seen that it's a 5 minute aggregate. Is this constant or are you getting 4x null followed by an actual value?

Would you mind sharing your logs please?

tomkerkhove commented 3 years ago

Or it might be related to https://github.com/tomkerkhove/promitor/issues/1290

SudhakarNandigam-TomTom commented 3 years ago

I set the scraping and aggregate interval as 5 minutes. Now I get 0 instead of 80 for no_tile_area collection. I am wondering why its not working for only one particular collection. I have the same cosmos db across multiple environments and I don't see the metric for no_tile_area collection in all of them.

[16:10:18 INF] Scraping azure_cosmos_db_document_count_collection_name for resource type CosmosDb
[16:10:20 INF] Found value 0 for metric azure_cosmos_db_document_count_collection_name with dimension no_tile_area as part of CollectionName dimension with aggregation interval 00:05:00
[16:10:20 INF] Found value 18 for metric azure_cosmos_db_document_count_collection_name with dimension tile_metadata as part of CollectionName dimension with aggregation interval 00:05:00
[16:10:20 INF] Found value null for metric azure_cosmos_db_document_count_collection_name with dimension __Empty as part of CollectionName dimension with aggregation interval 00:05:00
[16:15:00 INF] Scraping azure_cosmos_db_document_count_collection_name for resource type CosmosDb
[16:15:01 INF] Found value null for metric azure_cosmos_db_document_count_collection_name with dimension no_tile_area as part of CollectionName dimension with aggregation interval 00:05:00
[16:15:01 INF] Found value 18 for metric azure_cosmos_db_document_count_collection_name with dimension tile_metadata as part of CollectionName dimension with aggregation interval 00:05:00
[16:15:01 INF] Found value null for metric azure_cosmos_db_document_count_collection_name with dimension __Empty as part of CollectionName dimension with aggregation interval 00:05:00
[16:20:01 INF] Scraping azure_cosmos_db_document_count_collection_name for resource type CosmosDb
[16:20:01 INF] Found value null for metric azure_cosmos_db_document_count_collection_name with dimension no_tile_area as part of CollectionName dimension with aggregation interval 00:05:00
[16:20:01 INF] Found value 18 for metric azure_cosmos_db_document_count_collection_name with dimension tile_metadata as part of CollectionName dimension with aggregation interval 00:05:00
[16:20:01 INF] Found value null for metric azure_cosmos_db_document_count_collection_name with dimension __Empty as part of CollectionName dimension with aggregation interval 00:05:00
[16:25:00 INF] Scraping azure_cosmos_db_document_count_collection_name for resource type CosmosDb
[16:25:01 INF] Found value 0 for metric azure_cosmos_db_document_count_collection_name with dimension no_tile_area as part of CollectionName dimension with aggregation interval 00:05:00
[16:25:01 INF] Found value 18 for metric azure_cosmos_db_document_count_collection_name with dimension tile_metadata as part of CollectionName dimension with aggregation interval 00:05:00
[16:25:01 INF] Found value null for metric azure_cosmos_db_document_count_collection_name with dimension __Empty as part of CollectionName dimension with aggregation interval 00:05:00
[16:30:00 INF] Scraping azure_cosmos_db_document_count_collection_name for resource type CosmosDb
[16:30:02 INF] Found value 0 for metric azure_cosmos_db_document_count_collection_name with dimension no_tile_area as part of CollectionName dimension with aggregation interval 00:05:00
[16:30:02 INF] Found value 18 for metric azure_cosmos_db_document_count_collection_name with dimension tile_metadata as part of CollectionName dimension with aggregation interval 00:05:00
[16:30:02 INF] Found value null for metric azure_cosmos_db_document_count_collection_name with dimension __Empty as part of CollectionName dimension with aggregation interval 00:05:00
tomkerkhove commented 3 years ago

Based on my experience yesterday, I think there is a lag in Azure Cosmos DB reporting the metrics in Azure Monitor which creates data gaps as per #1290.

The Azure Portal probably does some gaps ignoring to render the chart but based on the raw data it looks like that is the case. In #1290 you'll see that my proposal is to allow you to ignore data gaps to find the last reported value if you want to.

In this case, is it just an empty collection? I'll add one of these as well;

SudhakarNandigam-TomTom commented 3 years ago

I will use ignore data gaps as it looks good. The collection is not empty, it contains 70 documents, index usage of 34kb, and data usage of 160kb. I am getting null or 0 values for all of them.

tomkerkhove commented 3 years ago

Can you enable the Azure Monitor logs please? https://promitor.io/configuration/v2.x/runtime/scraper#azure-monitor

SudhakarNandigam-TomTom commented 3 years ago

I don't see any Azure monitor logs. Below is the config:

server:
  httpPort: "88"
metricSinks:
  prometheusScrapingEndpoint:
    metricUnavailableValue: "NaN"
    enableMetricTimestamps: "true"
    baseUriPath: "/metrics"
metricsConfiguration:
  absolutePath: /config/metrics-declaration.yaml
telemetry:
  applicationInsights:
    isEnabled: "false"
  containerLogs:
    isEnabled: "true"
  defaultVerbosity: "Error"
azureMonitor:
  logging:
    informationLevel: "BodyAndHeaders"
    isEnabled: "true"