microsoft / ApplicationInsights-SDK-Labs

Application Insights experimental projects repository
MIT License
61 stars 48 forks source link

New metrics API proposal #18

Open SergeyKanzhelev opened 8 years ago

SergeyKanzhelev commented 8 years ago

Here is a new metrics API proposal: https://github.com/Microsoft/ApplicationInsights-SDK-Labs/blob/master/AggregateMetrics/README.md#usage---second-api

Histograms and timers metric types are not yet implemented. Also there is some differences in how meters metric type works that potentially need some adjustments.

Problems that this API solves:

  1. Super-fast incrementing for a simple metrics
  2. Full control over MetricTelemetry custom properties and ability to set context properties
  3. Once metric created - I see it's value constantly. If ever we will charge per metric - this model has more sense for customer as metrics are created explicitly

What this API doesn't solve:

  1. This API uses the standard MetricTelemetry and it doesn't solve the problem of limiting number of dimensions you can have for a single metric
  2. Batching metrics and sharing context may help reduce monitoring noise. This API doesn't attempt to address this problem

What is missing:

  1. Implement the metric type derived from RerquestTelemetry object. Like a Meter of failed requests or timer of average page execution time.
  2. We need to think how we can share one metric between QuickPulse and regular telemetry. Primarily for the metrics derived from other telemetry types like mentioned above.
SergeyKanzhelev commented 8 years ago

Questions from @vitalyf007:

In your mind, why do you see a difference between regular counter (with increment/decrement interface) and “meter (with Mark() interface). Overall it feels like a port of Metrics.Net to a degree (if we want to do the same, why port it, may be just use it)? And do I understand it correctly that all context properties become metric dimensions?

SergeyKanzhelev commented 8 years ago

Difference between Counter and Meter is in what being calculated for those. When Counter being aggregated - we will send an integer as a value. When Meter is aggregated we will send a rate. Same for Histogram metric we will have aggregation logic that will calculate min/max/stdDiv when for simple Counter we do not calculate those locally.

As I mentioned I changed a Meter semantics compared to what Metrics.NET does. They will calculate lifetime rate and sliding windows of 1 minute, 5 minutes and 15 minutes rates. In terms of Application Insights - one metric type generates three MetricTelemetry. So I changed semantic to the rate from the last Reset and call Reset in the end of every interval. With the default of 1 minute it will be rate over the last minute.

I didn't intend to port Metrics.NET. Just used a similar semantic of metric type names. I believe we will recommend to use Metrics.NET to our customers. It is a good idea to implement a metric reporting module for it. The main reason to have our own metrics aggregation logic is to implement standard types aggregations like calculating average response time metric on the agent across all telemetry, not just sampled examples on server side.

In proposed API when you create a metric (like a Counter) it has a set of properties and a context. But it is a single integer. When it will be reported - telemetry initializers may add more properties to the MetricTelemetry item. Ultimately defining which of those properties will become a dimensions is a backend job. I'll attempt to implement standard types aggregation and will see whether it fit the model when we will have a dimension like RequestTelemetry.Name and metrics aggregated by this dimension.

AlexBulankou commented 8 years ago

@SergeyKanzhelev can you clarify on "which of those properties will become dimensions is a backend job". With server side aggregation it makes sense. There are predefined dimensions and customer can pick and pay for additional dimensions that will be used for aggregation. How does client aggregation work here?

SergeyKanzhelev commented 8 years ago

SDK do not aggregate by dimensions in proposed API. For instance, you have a meter called failed requests. You may have associated custom properties to it. Every minute this meter will generate one telemetry item of type MetricTelemetry with the name failed requests. This telemetry item will have custom properties of the meter. It will also have telemetry context and all properties set by telemetry initializers. For instance, computer name will be set as Context.Cloud.RoleInstance.

So this API doesn't aggregate by dimensions - aggregation by dimensions happens on backend.

SergeyKanzhelev commented 8 years ago

BTW, Metrics.NET provide a mechanism to have multiple dimensions for the metrics. See this for instance.

I think similar concept should be used for request telemetry aggregations. But I'm not sure - need to try it out first