microsoft / ApplicationInsights-dotnet

ApplicationInsights-dotnet
MIT License
567 stars 285 forks source link

Support percentiles for aggregated metrics #1226

Closed andyvig closed 1 year ago

andyvig commented 4 years ago

<> Per this StackOverflow answer, it’s not possible to do percentiles on aggregated metrics sent through AppInsights. https://stackoverflow.com/questions/58124268/how-to-do-percentiles-on-custom-metrics-in-azure-appinsights

The request is to support this in some form, since it seems like a significant miss relative to other platforms like Prometheus. Is there any workaround other than sending telemetry for every metric measurement (since that won’t scale at all)?

I would love not to have to set up Prometheus/Grafana infrastructure to support this. Thanks!

cijothomas commented 4 years ago

@vgorbenko Is this something in Metrics roadmap.. ?

andyvig commented 4 years ago

Any indication of how close this might be?
For high-volume scenarios it makes AppInsights unusable for metrics (since simple averages won't cut it for production monitoring). If there's a solution AppInsights provides here that I'm missing please let me know (our plan is to track aggregate metrics for billions of events/day).

cijothomas commented 4 years ago

@andyvig This is not planned for 2019. I will check and report back the plan for next semester. (2020). I also know that its possible for you to write custom aggregator and plug into rest of metrics pipeline if you want to do percentlies. Its not documented, but if you want to take a look, heres where to start looking: https://github.com/microsoft/ApplicationInsights-dotnet/blob/develop/src/Microsoft.ApplicationInsights/Metrics/Extensibility/MetricSeriesAggregatorBase.cs

andyvig commented 4 years ago

Thanks @cijothomas, how would we then query that on the Log Analytics side? Does the percentile function support aggregate data?
I'm looking for something similar to this operation in Prometheus: "To calculate the 90th percentile of request durations over the last 10m" histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m])) From https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

cijothomas commented 4 years ago

Don't think there exists any native support as schema dont have anything for storing percentiles.https://github.com/microsoft/ApplicationInsights-dotnet/blob/develop/src/Microsoft.ApplicationInsights/Extensibility/Implementation/External/DataPoint_types.cs

You'd need to store quantiles as customProps, and do custom queries to get them, as Analytics wont understand customProps.

@SergeyKanzhelev even if one authors own aggregator, any way to store quantiles (.1,.5..9 etc) in schema?

RicardoNiepel commented 3 years ago

Any news / roadmap item / documentation / customer guidance of

  • publishing metrics as histograms to AppInsights
  • with the goal of using percentiles in Queries/Views/Alerts

to make AppInsights a good fit for SLOs?

cijothomas commented 3 years ago

No work is planned to add support for this in ApplicationInsights SDK.

The Metrics support in OpenTelemetry is coming by end of 2021 (nov 2021) - https://github.com/open-telemetry/opentelemetry-dotnet/issues/1501. After the OpenTelemetry part is shipped, there'd be a supported way to export metrics to ApplicationInsights, but no solid dates for this. Also no solid date for supporting percentiles/histogram in ApplicationInsights.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 300 days with no activity. Remove stale label or comment or this will be closed in 7 days.

andyvig commented 2 years ago

@cijothomas Checking in here...
This still a "maybe sometime in the future but all dates unknown" situation or is there any more definition around if/when this might be supported? Thanks.

cijothomas commented 2 years ago

No firm dates that I can share. (the feature requires not just SDK support, but backends/UI etc.). From SDK side, this will likely come via OpenTelemetry route, and not from this repo.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 300 days with no activity. Remove stale label or this will be closed in 7 days. Commenting will instruct the bot to automatically remove the label.

RicardoNiepel commented 1 year ago

We still don't have a clear statement, if and how this will come.

If this feature is not coming in the Azure Monitor / AppInsights backend and various SDKs, there should be some guidance published, how these technologies could be used if someone wants to follow SRE best practices:

mattmccleary commented 1 year ago

Hi Ricardo, Azure Managed Prometheus (Preview) was announced last month and is available with Azure Managed Grafana integration. This is compatible with Prom Client.

https://learn.microsoft.com/azure/azure-monitor/essentials/prometheus-metrics-overview

Additionally we are working on supporting percentiles via the OpenTelemetry histogram API. Unfortunately this work requires some major changes in how our backend works and thus any release is likely 6+ months out.

CC: @vishiy

RicardoNiepel commented 1 year ago

Thanks a lot for clarification and details around workarounds/other possibilities.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 300 days with no activity. Remove stale label or this will be closed in 7 days. Commenting will instruct the bot to automatically remove the label.

RicardoNiepel commented 1 year ago

@mattmccleary can you provide an update on this? Thx

dennis-yemelyanov commented 9 months ago

@mattmccleary any update on percentile tracking support?

I'm trying to use 'Azure.Monitor.OpenTelemetry.Exporter' to collect and report on our application latency. Looks like currently it tracks things like max value, but it's not very useful for practical purposes since max value can be influenced by a lot of external factors and doesn't necessarily provide an accurate view of how the app is doing. Ideally we want to track the 99-th percentile of this latency value, but I can't figure out how to do that or if it's supported at all.