Open calebhailey opened 2 years ago
My part of this is to spec the MVP set of CloudFront (CF) metric names, as queried from CloudWatch(CW). To do that I analyzed Sumo Logic's existing CF dashboards, Telegraf's CF/CW support, Prometheus' CF/CW support, and DataDog's CF support.
Reference Links:
cloudwatch_exporter
aws_cloudfront_requests
: The total number of viewer requests received by CloudFront, for all HTTP methods and for both HTTP and HTTPS requests.aws_cloudfront_bytes_downloaded
: The total number of bytes downloaded by viewers for GET, HEAD, and OPTIONS requests.aws_cloudfront_bytes_uploaded
: The total number of bytes that viewers uploaded to your origin with CloudFront, using POST and PUT requests.aws_cloudfront_4xx_error_rate
: The percentage of all viewer requests for which the response’s HTTP status code is 4xx.aws_cloudfront_5xx_error_rate
: The percentage of all viewer requests for which the response’s HTTP status code is 5xx.aws_cloudfront_total_error_rate
: The percentage of all viewer requests for which the response’s HTTP status code is 4xx or 5xx.aws_cloudfront_401_error_rate
: The percentage of all viewer requests for which the response’s HTTP status code is 401aws_cloudfront_403_error_rate
: The percentage of all viewer requests for which the response’s HTTP status code is 403aws_cloudfront_404_error_rate
: The percentage of all viewer requests for which the response’s HTTP status code is 404aws_cloudfront_502_error_rate
: The percentage of all viewer requests for which the response’s HTTP status code is 502aws_cloudfront_503_error_rate
: The percentage of all viewer requests for which the response’s HTTP status code is 503aws_cloudfront_504_error_rate
: The percentage of all viewer requests for which the response’s HTTP status code is 504aws_cloudfront_origin_latency
: The total time spent from when CloudFront receives a request to when it starts providing a response to the network, for requests that are served from the origin (not the CloudFront cache). This is also known as first byte latency or time-to-first-byteaws_cloudfront_cache_hit_rate
: The percentage of all cacheable requests for which CloudFront served the content from its cache. HTTP POST and PUT requests (and errors) are not considered cacheable requests.aws_cloudfront_function_invocations
: The number of times the function was started in a given time period.aws_cloudfront_function_validation_errors
: The number of validation errors produced by the function in a given time period.aws_cloudfront_function_execution_errors
: The number of execution errors that occurred in a given time period.aws_cloudfront_function_execution_time
: The amount of time that the function took to run as a percentage of the maximum allowed time.distribution_id
: Derived from DistributionId
dimension. Value is the ID of the CloudFront distribution.function_name
: Derived from FunctionName
dimension. Value is the name of the function.region
: Derived from Region
dimension. The value for Region is always Global
, because CloudFront is a global service.This is the only product that gives CloudFront-specific metric names in their documentation (rather than just providing a generalized CloudWatch collector).
The list of metrics is here.
The naming convention differs from Prometheus/Telegraf in that it is less generalized to CloudWatch, and more specific to CloudFront. Metrics names follow this structure: aws.cloudfront.<metric-name>
. Notably, the concept of statistics is not included (e.g. no sum
/average
/etc.) is described.
For example, aws.cloudfront.requests
Also, they include support for CloudFront's "Additional Metrics", and for Lambda Edge.
Per their docs: "Each of the metrics retrieved from AWS are assigned the same tags that appear in the AWS console, including but not limited to aws_account
, region
, and distributionid
."
The docs for cloudwatch_exporter
don't mention CloudFront specifically, but give general information on CloudWatch metrics, with some examples.
It seems that they follow naming convention like:
aws_<product-namespace>_<metric-name>_<statistic>
For example, the ELB RequestCount
metric, using Sum
statistic is produced as:
aws_elb_request_count_sum
Statistic suffixes are mapped as such in the code:
_sum
_sample_count
_minimum
_maximum
_average
For CloudFront the namespace is AWS/CloudFront
, so the prefix for Promethus metric names would be aws_cloudfront_*
Tags in cloudwatch_exporter
are derived from what the AWS docs calls dimensions, and vary by the metric. They are converted to snake case before being used as tags.
For CloudFront that looks like the tags would be:
distribution_id
: Derived from DistributionId
dimension. Value is the ID of the CloudFront distribution.function_name
: Derived from FunctionName
dimension. Value is the name of the function.region
: Derived from Region
dimension. The value for Region is always Global
, because CloudFront is a global service.Telegraf provides a Cloudwatch input plugin, and the metric names it produces are similar to the Prometheus cloudwatch_exporter
... However it uses an additional prefix of cloudwatch_*
, so the example above would look like cloudwatch_aws_elb_request_count_sum
rather than aws_elb_request_count_sum
.
Otherwise, behaviour/metric names/tags seem the same as Prometheus.
The existing Sumo CloudFront dashboards work from logs, parsing out CSV data from individual log lines. I reviewed each observable to see if we might be able to replace those with a metrics based approach, but unfortunately none of the current dashboard observables are able to be replaced by what is available from CloudWatch.
Rather, it's my suggestion that we create a news metrics-based Sumo Logic dashboards for CloudFront that surfaces the metrics described above in three dashboards: "AWS CloudFront - Overview", "AWS CloudFront - Additional Metrics", and "AWS CloudFront - Lambda@Edge"
Since there is an existing CloudFront preset implemented, I reviewed that code to see what needs to change to get the desired output. As far as I can tell right now the code outputs metric names similar to DataDogs. Which means the change may only need to be as simple as changing the .
separators to _
s. Unfortunately there is no example output provided in our documentation, and I don't have CloudWatch/CloudFront to test it against, so I have no way to verify the current metric naming.
The measurement string in the CloudFront preset includes the MVP and Additional Metrics sets, but doesn't seem to include Lambda Edge metrics.
See first comment for specs.