Open tenutensing opened 2 years ago
Would you mind trying to use our latest version and share the outcome please?
Also, it's hard to tell without the configuration of Promitor, can you share it please?
Sure. I can try with the latest version and check. Just wanted to understand the root cause for this behavior first. If this is something caused due to configuration mistake from my end or not. This was previously working, when we did the POC.
As requested, attaching herewith the configuration.
FYI : I created a parent chart, which has dependency on "promitor-agent-resource-discovery" and "promitor-agent-scraper", and installed both charts in a single step.
Parent chart - monitoring-promitor
Dependent charts - promitor-agent-scraper promitor-agent-resource-discovery
I tried to scrape a single metric from a single PaaS to analyze the behavior. I don't see the broken scrape points in this scenario.
Whereas while configuring the scraper to collect multiple metrics from multiple PaaS, the scrape points are discontinuous.
Pod logs attached along for reference.
all_metrics0.txt single_metrics0.txt
Sample traces in pod logs:
_[11:16:42 FTL] Failed to scrape resource for metric 'storageaccount_availability'
System.Threading.Tasks.TaskCanceledException: The operation was canceled.
---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
---> System.Net.Sockets.SocketException (125): Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
at System.Net.Security.SslStream.1 task, Int32 min, Int32 initial) at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory
1 buffer)
at System.Net.Http.HttpConnection.FillAsync()
at System.Net.Http.HttpConnection.ReadNextResponseHeaderLineAsync(Boolean foldedHeadersAllowed)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at Microsoft.Rest.RetryDelegatingHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Management.ResourceManager.Fluent.Core.ProviderRegistrationDelegatingHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Promitor.Agents.Core.RequestHandlers.ThrottlingRequestHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /src/Promitor.Agents.Core/ThrottlingRequestHandler.cs:line 48
at Microsoft.Azure.Management.ResourceManager.Fluent.Core.UserAgentDelegatingHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts) at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperations.ListWithHttpMessagesAsync(String resourceUri, String metricnamespace, Dictionary
2 customHeaders, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsOperationsExtensions.ListAsync(IMetricDefinitionsOperations operations, String resourceUri, String metricnamespace, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
at Microsoft.Azure.Management.Monitor.Fluent.MetricDefinitionsImpl.Microsoft.Azure.Management.Monitor.Fluent.IMetricDefinitions.ListByResourceAsync(String resourceId, CancellationToken cancellationToken)
at Promitor.Integrations.AzureMonitor.AzureMonitorClient.QueryMetricAsync(String metricName, String metricDimension, AggregationType aggregationType, TimeSpan aggregationInterval, String resourceId, String metricFilter, Nullable1 metricLimit) in /src/Promitor.Integrations.AzureMonitor/AzureMonitorClient.cs:line 74 at Promitor.Core.Scraping.AzureMonitorScraper
1.ScrapeResourceAsync(String subscriptionId, ScrapeDefinition1 scrapeDefinition, TResourceDefinition resourceDefinition, AggregationType aggregationType, TimeSpan aggregationInterval) in /src/Promitor.Core.Scraping/AzureMonitorScraper.cs:line 72 at Promitor.Core.Scraping.Scraper
1.ScrapeAsync(ScrapeDefinition`1 scrapeDefinition) in /src/Promitor.Core.Scraping/Scraper.cs:line 103
Thanks for testing!
Appreciate if you could help us understand the root cause of the same. Our objective is to monitor multiple PaaS resources/metrics via promitor exporter. With this missing scrape endpoints, we're unable to promote this feature to production.
I will take a look but it looks like requests are being timed out so I'd first make sure that the network is fine.
Also, did you check the system metrics with regards to throttling?
Here are the throttling metrics for your reference:-
HELP promitor_ratelimit_arm Indication how many calls are still available before Azure Resource Manager (ARM) is going to throttle us. TYPE promitor_ratelimit_arm gauge promitor_ratelimit_arm{tenant_id="xxx",subscription_id="xxx",app_id="xxx"} 11983 1660032355059
HELP promitor_ratelimit_arm_throttled Indication concerning Azure Resource Manager are being throttled. (1 = yes, 0 = no). TYPE promitor_ratelimit_arm_throttled gauge promitor_ratelimit_arm_throttled{tenant_id="xxx",subscription_id="xxx",app_id="xxx"} 0 1660032355059
[UPDATE] I've opened a case with Microsoft wrt this issue, in order to figure out whether this behavior is due to any throttling happening at the Azure Monitor APIs that are being used at the backend of promitor exporter.
Below are MS support team's comments:
+We picked metric 'memory percent' and plotted it via Metrics explorer for last 1 hour, could see continuous time series:
+Checked for longer time frame as well- 24 hours, could see continuous time series:
+You also mentioned that you tried to scrape a single metric from a single PaaS to analyze the behavior but doesn't see the broken scrape points in this scenario. Whereas while configuring the scraper to collect multiple metrics from multiple PaaS, the scrape points are discontinuous, and issue is seen.
+So, we plotted multiple metrics- CPU percent and memory percent, it looks fine as below:
+Also, since you mentioned that promitor is using Rest API to fetch metrics, we tried hitting Azure Monitor API via https://docs.microsoft.com/en-us/rest/api/monitor/metrics/list?tabs=HTTP&tryIt=true&source=docs#code-try-0 which is giving expected output.
Also, confirmed from the back end, that there are no broken lines in the graph.
Hm this is hard to figure out given there are Promitor end-users scraping hundreds of resources 🤔
Report
Hi there,
I've been trying to pull the Azure PaaS metric via promitor exporter. The metrics are successfully scraped, and pushed to Grafana UI. But unfortunately the metrics shown is not continuous, and is broken intermittently.
Have any one faced this issue before? I'm using a bit older version of the charts. Screenshot attached for reference.
Kindly help.
Regards, Tenu
Expected Behavior
Expecting the metric series to be continuous, without missing the scrape points. Screenshot attached.
This screenshot was taken during the time of Proof of concept task. It was working fine then.
Actual Behavior
Metric series are broken intermittently.
Steps to Reproduce the Problem
Chart versions:
Component
Scraper
Version
2.5.1
Configuration
Configuration:
Logs
Platform
Microsoft Azure
Contact Details
tenutensing@gmail.com