tomkerkhove / promitor

Bringing Azure Monitor metrics where you need them.
https://promitor.io
MIT License
251 stars 92 forks source link

Query Azure Application Insights API #1090

Open Romiko opened 4 years ago

Romiko commented 4 years ago

As a SRE engineer I would like to query Azure Application Insights API. This is located at: https://api.applicationinsights.io/v1/apps/$AppInsightsId/query

I would then like to expose these metrics in a format for Prometheus to scrape. This will make promitor extremely powerful in exposing any resource metric or application that is writing telemetry to application insights. Which can be non azure resources.

The full source code is here. Application Insights Exporter for Prometheus - Python

Example

customCollectors: servicelevelindicators:

tomkerkhove commented 4 years ago

We are planning KSQL support for Log Analytics (which is backing Application Insights) via #1076 and add support for custom metrics from Azure Application Insights #645.

This seems like what you are looking for?

Romiko commented 4 years ago

Indeed, so what you can do is do a declaration in a config like this.

customCollectors: servicelevelindicators:

Romiko commented 4 years ago

The above uses a default summary by count.

def count(self, schema, query_string, time_range=None, customdimensions=None):
    summarize_column = ",".join(customdimensions)
    query_string = self._construct_count_query(schema, query_string, time_range=time_range,customdimensions=customdimensions)
    self.logger.info("Count with query: {0}".format(query_string))
    result = self._query_api(query_string)
    if not result:
        return False
    try:
            metrics = []
            for row in result['tables'][0]['rows']:
                metric = AppInsightsMetric(summarize_column, row[-1], row[:-1])
                self.logger.info("Metric: Value {0} Labels {1}".format(metric.value, metric.labelvalues))
                metrics.append(metric)
    except Exception as e:
        self.logger.warning("Exception in count. Error: {0}".format(str(e)))
        return None
    self.logger.debug("count with query: {0}".format(query_string))
    return metrics

def _construct_query(self, schema, query, time_range=None,customdimensions=None):
    if not time_range:
        time_range = self.exporter_interval
    self.logger.info("Constructing Query: {0} Schema: {1} Hours/Minutes/Seconds: {2}".format(query, schema, time_range))
    now = datetime.datetime.utcnow()
    false_now = now - datetime.timedelta(minutes=5)
    d = datetime.timedelta(hours=time_range.hours, minutes=time_range.minutes, seconds=time_range.seconds)
    before = false_now - d
    now_str = false_now.strftime("%Y-%m-%dT%H:%M:%SZ")
    before_str = before.strftime("%Y-%m-%dT%H:%M:%SZ")
    time_string = "where timestamp > datetime({0}) | where timestamp < datetime({1}) | order by timestamp desc".format(before_str, now_str)
    query_string = '{0} | {1} | {2}'.format(schema, time_string, query)
    summarize='| summarize count()'
    dimensions=[]
    if(customdimensions):
        summarize = summarize + " by "
        for cd in customdimensions:
            dimensions.append('tostring(customDimensions["{0}"])'.format(cd))
    query_string += summarize + ",".join(dimensions)
    return query_string
Romiko commented 4 years ago

I guess the challenge for the specs is dealing with

Schemas e.g. Requests, Traces,Exceptions, customMetrics

Labels - use CustomDimensions as a convention

Then the other challenge is providing a default query type e.g. Summarize By Count is great for gauges.

Percentile might be another for others to use, but for what we do SLI's we just want the raw sums and we calculate the percentiles further down the chain in an SLO operator.

SLO - operator for k8

  serviceLevelIndicator:
    prometheus:
      address: http://myprometheus:9090
      totalQuery: sum(increase(http_request_total{host="awesome_service_io"}[2m]))
      errorQuery: sum(increase(http_request_total{host="awesome_service_io", code=~"5.."}[2m]))
Romiko commented 4 years ago

Here is sample output

HELP appinsights_exporter_requests_success_total where resultCode startswith "2" or resultCode startswith "3" TYPE appinsights_exporter_requests_success_total gauge appinsights_exporter_requests_success_total{AspNetCoreEnvironment="Production",Kubernetes.Deployment.Name="samples-dotnet-webapp",Kubernetes.ReplicaSet.Name="samples-dotnet-webapp-5b877ddf95"} 28.0 HELP appinsights_exporter_requests_success_total where resultCode startswith "2" or resultCode startswith "3" TYPE appinsights_exporter_requests_success_total gauge appinsights_exporter_requests_success_total{AspNetCoreEnvironment="Production",Kubernetes.Deployment.Name="samples-dotnet-webapp",Kubernetes.ReplicaSet.Name="samples-dotnet-webapp-8567bf9d5b"} 56.0

Romiko commented 4 years ago

Finally this is what the current code generates.

requests | where timestamp > datetime(2020-06-17T06:54:12Z) | where timestamp < datetime(2020-06-17T06:55:12Z) | order by timestamp desc | where resultCode startswith "2" or resultCode startswith "3"| summarize count() by tostring(customDimensions["Kubernetes.ReplicaSet.Name"]),tostring(customDimensions["AspNetCoreEnvironment"])

The above is perfect for gauges.

tomkerkhove commented 4 years ago

What if you just provide the whole KSQL statement and we output the result for you?

tomkerkhove commented 4 years ago

Feel free to let me know if KSQL queries are good enough for you @Romiko and we can track it