tomkerkhove / promitor

Bringing Azure Monitor metrics where you need them.
https://promitor.io
MIT License
248 stars 91 forks source link

metrics[*].scraping.schedule is not optional #1634

Open ResDiaryLewis opened 3 years ago

ResDiaryLewis commented 3 years ago

Report

In the docs here, it mentions this:

Additionally, the following fields are optional: ...

  • scraping.schedule - A scraping schedule for the individual metric; overrides the the one specified in metricDefaults

However I receive various when this field is omitted.

Expected Behavior

If a metric is missing a scraping schedule, the metric default schedule should be used.

Actual Behavior

If the scraping block is omitted entirely from a metric, this error is thrown:

System.ArgumentNullException: [scraping] cannot be Null. (Parameter 'scraping')
   at GuardNet.Guard.For[TException](Func`1 predicate, TException exception)
   at GuardNet.Guard.NotNull[TParam,TException](TParam param, TException exception)
   at GuardNet.Guard.NotNull[TParam](TParam param, String paramName, String message)
   at GuardNet.Guard.NotNull[TParam](TParam param, String paramName)
   at Promitor.Core.Scraping.Configuration.Model.Metrics.ScrapeDefinition`1..ctor(AzureMetricConfiguration azureMetricConfiguration, PrometheusMetricDefinition prometheusMetricDefinition, Scraping scraping, TResourceDefinition resource, String subscriptionId, String resourceGroupName) in /src/Promitor.Core.Scraping/Model/Metrics/ScrapeDefinition.cs:line 34
   at Promitor.Core.Scraping.Configuration.Model.Metrics.MetricDefinition.CreateScrapeDefinition(IAzureResourceDefinition resource, AzureMetadata azureMetadata) in /src/Promitor.Core.Scraping/Model/Metrics/MetricDefinition.cs:line 74
   at Microsoft.Extensions.DependencyInjection.SchedulingExtensions.ScheduleResourceScraping(IAzureResourceDefinition resource, AzureMetadata azureMetadata, MetricDefinition metric, AzureMonitorClientFactory azureMonitorClientFactory, MetricSinkWriter metricSinkWriter, IRuntimeMetricsCollector runtimeMetricCollector, IConfiguration configuration, IOptions`1 azureMonitorLoggingConfiguration, ILoggerFactory loggerFactory, ILogger`1 logger, IServiceCollection services) in /src/Promitor.Agents.Scraper/SchedulingExtensions.cs:line 69
   at Microsoft.Extensions.DependencyInjection.SchedulingExtensions.ScheduleMetricScraping(IServiceCollection services) in /src/Promitor.Agents.Scraper/SchedulingExtensions.cs:line 34
   at Promitor.Agents.Scraper.Startup.ConfigureServices(IServiceCollection services) in /src/Promitor.Agents.Scraper/Startup.cs:line 55
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.InvokeCore(Object instance, IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.<>c__DisplayClass9_0.<Invoke>g__Startup|0(IServiceCollection serviceCollection)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.Invoke(Object instance, IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.<>c__DisplayClass8_0.<Build>b__0(IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.GenericWebHostBuilder.UseStartup(Type startupType, HostBuilderContext context, IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.GenericWebHostBuilder.<>c__DisplayClass12_0.<UseStartup>b__0(HostBuilderContext context, IServiceCollection services)
   at Microsoft.Extensions.Hosting.HostBuilder.CreateServiceProvider()
   at Microsoft.Extensions.Hosting.HostBuilder.Build()
   at Promitor.Agents.Scraper.Program.Main(String[] args) in /src/Promitor.Agents.Scraper/Program.cs:line 35

I've not found a way to omit this, instead I'm just duplicating the default to each metric like this:

    version: v1
    azureMetadata:
      tenantId: "tenant"
      subscriptionId: "sub"
      resourceGroupName: "group"
    metricDefaults:
      aggregation:
        interval: "00:05:00"
      scraping:
        schedule: "*/1 * * * *" # DUPLICATED
    metrics:
    # Loadbalancer metrics, for each aggregation and metric combination
    - name: "name"
      description: "desc"
      scraping:
        schedule: "*/1 * * * *" # DUPLICATED
      resourceType: "Generic"
      labels:
        a: "b"
      azureMetricConfiguration:
        metricName: "foo"
        aggregation:
          type: "Average"
      resources:
      - resourceGroupName: "group"
        resourceUri: "uri"

I've tried to set scraping to ~ or {} too.

Steps to Reproduce the Problem

  1. Use a metrics-declaration.yaml config file.
  2. Omit the scraping field from a metric, e.g.:
    version: v1
    azureMetadata:
      tenantId: "tenant"
      subscriptionId: "sub"
      resourceGroupName: "group"
    metricDefaults:
      aggregation:
        interval: "00:05:00"
      scraping:
        schedule: "*/1 * * * *" # DUPLICATED
    metrics:
    # Loadbalancer metrics, for each aggregation and metric combination
    - name: "name"
      description: "desc"
      #scraping:
      # schedule: "*/1 * * * *" # DUPLICATED
      resourceType: "Generic"
      labels:
        a: "b"
      azureMetricConfiguration:
        metricName: "foo"
        aggregation:
          type: "Average"
      resources:
      - resourceGroupName: "group"
        resourceUri: "uri"

Component

Scraper

Version

2.3.0

Configuration

Configuration:

# Add your scraping configuration here

Logs

example

Platform

Microsoft Azure

Contact Details

lewis.jackson@resdiary.com

tomkerkhove commented 3 years ago

Odd because I don't use it myself: https://github.com/tomkerkhove/promitor/blob/master/config/promitor/scraper/metrics.yaml

I'll try to reproduce and see what is causing this.

ResDiaryLewis commented 3 years ago

Odd because I don't use it myself: https://github.com/tomkerkhove/promitor/blob/master/config/promitor/scraper/metrics.yaml

I'll try to reproduce and see what is causing this.

Thanks! I'll actually get back to you with some more info soon - I'm in the process of upgrading from version 1.x and I've found that only specific metrics throw this error, so I should be able to isolate it.

ResDiaryLewis commented 3 years ago

We're using a custom Helm configuration because we wrote our own when we first adopted Promitor a while ago. The way that it builds up metric definitions can result in definitions where resources is an empty array []. I think that's what causes this error message. Here's some examples:

No empty resources

Promitor starts as expected:

version: v1
azureMetadata:
    tenantId: "tenant"
    subscriptionId: "sub"
    resourceGroupName: "group"
metricDefaults:
    aggregation:
    interval: "00:05:00"
    scraping:
    schedule: "*/1 * * * *"
metrics:
# SQL Database metrics, for each aggregation and metric combination
- name: "azure_sql_cpu_percent_average"
    description: "'cpu_percent' with aggregation 'Average'"
    resourceType: "Generic"
    labels:
    component: "sql-database"
    azureMetricConfiguration:
    metricName: "cpu_percent"
    aggregation:
        type: "Average"
    resources:
    # Ignore vCoreModel databases for the dtu_consumption_percent metric
    - resourceGroupName: "group"
      resourceUri: "Microsoft.Sql/servers/server/databases/db"

# Loadbalancer metrics, for each aggregation and metric combination
- name: "azure_lb_VipAvailability_average"
    description: "'VipAvailability' with aggregation 'Average'"
    resourceType: "Generic"
    labels:
    component: "load-balancer"
    azureMetricConfiguration:
    metricName: "VipAvailability"
    aggregation:
        type: "Average"
    resources:
    - resourceGroupName: "group"
      resourceUri: "Microsoft.Network/loadBalancers/lb"

All empty resources

We get a useful error message: Error 25:3: Either 'resources' or 'resourceDiscoveryGroups' must be specified.

version: v1
azureMetadata:
    tenantId: "tenant"
    subscriptionId: "sub"
    resourceGroupName: "group"
metricDefaults:
    aggregation:
    interval: "00:05:00"
    scraping:
    schedule: "*/1 * * * *"
metrics:
# SQL Database metrics, for each aggregation and metric combination
- name: "azure_sql_cpu_percent_average"
    description: "'cpu_percent' with aggregation 'Average'"
    resourceType: "Generic"
    labels:
    component: "sql-database"
    azureMetricConfiguration:
    metricName: "cpu_percent"
    aggregation:
        type: "Average"
    resources: []

# Loadbalancer metrics, for each aggregation and metric combination
- name: "azure_lb_VipAvailability_average"
    description: "'VipAvailability' with aggregation 'Average'"
    resourceType: "Generic"
    labels:
    component: "load-balancer"
    azureMetricConfiguration:
    metricName: "VipAvailability"
    aggregation:
        type: "Average"
    resources: []

One metric has resources, another does not

Promitor gives the error message mentioned above in the OP.

version: v1
azureMetadata:
    tenantId: "tenant"
    subscriptionId: "sub"
    resourceGroupName: "group"
metricDefaults:
    aggregation:
    interval: "00:05:00"
    scraping:
    schedule: "*/1 * * * *"
metrics:
# SQL Database metrics, for each aggregation and metric combination
- name: "azure_sql_cpu_percent_average"
    description: "'cpu_percent' with aggregation 'Average'"
    resourceType: "Generic"
    labels:
    component: "sql-database"
    azureMetricConfiguration:
    metricName: "cpu_percent"
    aggregation:
        type: "Average"
    resources:
    # Ignore vCoreModel databases for the dtu_consumption_percent metric
    - resourceGroupName: "group"
      resourceUri: "Microsoft.Sql/servers/server/databases/db"

# Loadbalancer metrics, for each aggregation and metric combination
- name: "azure_lb_VipAvailability_average"
    description: "'VipAvailability' with aggregation 'Average'"
    resourceType: "Generic"
    labels:
    component: "load-balancer"
    azureMetricConfiguration:
    metricName: "VipAvailability"
    aggregation:
        type: "Average"
    resources: []
ResDiaryLewis commented 3 years ago

So this is more of my fault than yours Tom, maybe there's an argument for the error message not being very helpful but that's an aside! I'll leave you to close this issue if you want to. Thanks for bearing with me!

tomkerkhove commented 3 years ago

Ha interesting, but I don't really understand what caused it, because the exception message is indeed not really helpful.

Can you clarify the cause of it please?

ResDiaryLewis commented 3 years ago

If it helps, as I was migrating to 2.x I saw this error again after forgetting to change ServiceBusQueue to ServiceBusNamespace for a declaration.

Ha interesting, but I don't really understand what caused it, because the exception message is indeed not really helpful.

Can you clarify the cause of it please?

I guess that having an empty resources list counts as a malformed configuration, but I'm unsure why the error message changes when one metric declaration is fine and another is not.

ResDiaryLewis commented 3 years ago

If it helps, as I was migrating to 2.x I saw this error again after forgetting to change ServiceBusQueue to ServiceBusNamespace for a declaration.

I can reproduce this again by having one "good" metric declaration and another "bad" one (where the resource type isn't valid as it's ServiceBusQueue). So I'm guessing that the validation isn't evaluating every declaration.

tomkerkhove commented 3 years ago

Very odd, thanks for letting me know - I'll look at it!

tomkerkhove commented 2 years ago

Another case that triggered this:

metrics:
  - name: promitor_demo_cdn_requests_declared
    description: "Amount of requests at the origin of our Azure CDN"
    resourceType: Cdn
    azureMetricConfiguration:
      metricName: OriginRequestCount
      aggregation:
        type: Total
    resources:
-    - name: promitor-testing-resource-us-cdn
+    - cdnName: promitor-testing-resource-us-cdn
      resourceGroupName: promitor-testing-infrastructure-us