tomkerkhove / promitor

Bringing Azure Monitor metrics where you need them.
https://promitor.io
MIT License
248 stars 91 forks source link

Intermittent fatal errors: InvalidOperationException: Sequence contains no matching element #2331

Closed gburton1 closed 11 months ago

gburton1 commented 1 year ago

Report

We have some metrics that scrape fine >99% of the time, but encounter an FTL error in the scraper code a few times per day for some reason. It's just a couple failed scrapes here and there, but it would be nice to know what condition causes this, and hopefully there is a way to improve it.

Example configuration that works >99% of the time

          - name: azure_redis_respond_errortype_errors
            description: "Errors."
            resourceType: RedisCache
            azureMetricConfiguration:
              metricName: errors    
              aggregation:
                type: Maximum
              dimension:
                name: ErrorType
            resourceDiscoveryGroups:
            - name: our-resource-group

Here's the error that gets logged when the scrape fails:

[00:25:01 FTL] Failed to scrape resource for metric 'azure_redis_respond_errortype_errors'
System.InvalidOperationException: Sequence contains no matching element
   at System.Linq.ThrowHelper.ThrowNoMatchException()
   at Promitor.Core.Metrics.MeasuredMetric.CreateForDimension(Nullable`1 value, String dimensionName, TimeSeriesElement timeseries) in /src/Promitor.Core/MeasuredMetric.cs:line 68
   at Promitor.Integrations.AzureMonitor.AzureMonitorClient.QueryMetricAsync(String metricName, String metricDimension, AggregationType aggregationType, TimeSpan aggregationInterval, String resourceId, String metricFilter, Nullable`1 metricLimit) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 117
   at Promitor.Core.Scraping.AzureMonitorScraper`1.ScrapeResourceAsync(String subscriptionId, ScrapeDefinition`1 scrapeDefinition, TResourceDefinition resourceDefinition, AggregationType aggregationType, TimeSpan aggregationInterval) in /src/Promitor.Core.Scraping/AzureMonitorScraper.cs:line 72
   at Promitor.Core.Scraping.Scraper`1.ScrapeAsync(ScrapeDefinition`1 scrapeDefinition) in /src/Promitor.Core.Scraping/Scraper.cs:line 103

Expected Behavior

Properly defined metrics should work consistently when Azure Monitor itself is in good condition.

Actual Behavior

Properly defined metrics experience intermittent fatal errors on scrapes, just a few per day always.

Steps to Reproduce the Problem

  1. I provided a metric definition in the report section.
  2. You could stand up an Azure Redis Cache to run that metric configuration against. Unfortunately, you would have to wait 24 hours to catch a few occurrences of the error.

Component

Scraper

Version

v2.9.1

Configuration

Configuration:

          - name: azure_redis_respond_errortype_errors
            description: "Errors."
            resourceType: RedisCache
            azureMetricConfiguration:
              metricName: errors    
              aggregation:
                type: Maximum
              dimension:
                name: ErrorType
            resourceDiscoveryGroups:
            - name: our-resource-group

Logs

[00:25:01 FTL] Failed to scrape resource for metric 'azure_redis_respond_errortype_errors'
System.InvalidOperationException: Sequence contains no matching element
   at System.Linq.ThrowHelper.ThrowNoMatchException()
   at Promitor.Core.Metrics.MeasuredMetric.CreateForDimension(Nullable`1 value, String dimensionName, TimeSeriesElement timeseries) in /src/Promitor.Core/MeasuredMetric.cs:line 68
   at Promitor.Integrations.AzureMonitor.AzureMonitorClient.QueryMetricAsync(String metricName, String metricDimension, AggregationType aggregationType, TimeSpan aggregationInterval, String resourceId, String metricFilter, Nullable`1 metricLimit) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 117
   at Promitor.Core.Scraping.AzureMonitorScraper`1.ScrapeResourceAsync(String subscriptionId, ScrapeDefinition`1 scrapeDefinition, TResourceDefinition resourceDefinition, AggregationType aggregationType, TimeSpan aggregationInterval) in /src/Promitor.Core.Scraping/AzureMonitorScraper.cs:line 72
   at Promitor.Core.Scraping.Scraper`1.ScrapeAsync(ScrapeDefinition`1 scrapeDefinition) in /src/Promitor.Core.Scraping/Scraper.cs:line 103

Platform

Microsoft Azure

Contact Details

No response

tomkerkhove commented 1 year ago

Thanks for the report. Are you willing to contribute a fix?

gburton1 commented 1 year ago

We would be willing to contribute (I'm at Axon, and we have contributed before). We just wanted to get your take on what you think is happening and confirm that you think it's a bug, as opposed to normal, expected error due to a misconfiguration or misuse.

gburton1 commented 1 year ago

@tomkerkhove let me know your initial thoughts. I will schedule someone from my team to dig into it with any direction you provide. You already know @locmai. Others on my team could investigate it as well.

tomkerkhove commented 1 year ago

Please give me some time as I maintain more than just Promitor so I cannot guarantee a response within 1-2 days :)

I think the exception is fairly clear and the CreateForDimension method needs to be improved.

gburton1 commented 1 year ago

We will start looking into this over the next two weeks to see if we can improve the handling in this method.