prometheus-community / yet-another-cloudwatch-exporter

Prometheus exporter for AWS CloudWatch - Discovers services through AWS tags, gets CloudWatch metrics data and provides them as Prometheus metrics with AWS tags as labels
Apache License 2.0
983 stars 335 forks source link

Reimplement PrometheusMetric to use slices for label pairs #1528

Open cristiangreco opened 1 month ago

cristiangreco commented 1 month ago

This refactoring stems from an attempt to optimise memory usage in BuildMetrics and createPrometheusLabels, where labels are copied across various maps. The new PrometheusMetric uses slices to store label pairs and is implemented to guarantee that labels are always sorted by key. The rationale is that slices might be more memory efficient than maps for large preallocation sizes. Moreover, the fact that label keys are promptly available (no need to iterate over the map) comes handy in a bunch of places where we save additional allocations. Lastly, while we spend cycles to do explicit sorting in yace now, it should save us some comparisons when prometheus sorts labels internally.

The refactoring also comes with a reimplementation of signature for labels, since the prometheus models only work with maps.

I've added a bunch of benchmarks of specific methods. They show that sometimes the change is noticeable, sometimes it's not (but the overall impact is hard to judge in synthetic benchs due to the variety of input one can get at runtime fromcoming from large aws responses).

Benchmark_EnsureLabelConsistencyAndRemoveDuplicates:

                                              │  before.txt  │              after.txt              │
                                              │    sec/op    │   sec/op     vs base                │
_EnsureLabelConsistencyAndRemoveDuplicates-12   14.203µ ± 2%   9.115µ ± 1%  -35.82% (p=0.000 n=10)

                                              │ before.txt │             after.txt              │
                                              │    B/op    │    B/op     vs base                │
_EnsureLabelConsistencyAndRemoveDuplicates-12   448.0 ± 0%   256.0 ± 0%  -42.86% (p=0.000 n=10)

                                              │ before.txt  │             after.txt              │
                                              │  allocs/op  │ allocs/op   vs base                │
_EnsureLabelConsistencyAndRemoveDuplicates-12   17.000 ± 0%   9.000 ± 0%  -47.06% (p=0.000 n=10)

Benchmark_createPrometheusLabels:

                           │ before.txt  │           after.txt           │
                           │   sec/op    │   sec/op     vs base          │
_createPrometheusLabels-12   41.86m ± 5%   41.40m ± 9%  ~ (p=0.481 n=10)

                           │  before.txt  │              after.txt               │
                           │     B/op     │     B/op      vs base                │
_createPrometheusLabels-12   2.867Mi ± 0%   1.531Mi ± 0%  -46.59% (p=0.000 n=10)

                           │ before.txt  │             after.txt              │
                           │  allocs/op  │  allocs/op   vs base               │
_createPrometheusLabels-12   40.00k ± 0%   40.00k ± 0%  -0.00% (p=0.000 n=10)

Benchmark_BuildMetrics:

                 │ before.txt  │             after.txt              │
                 │   sec/op    │   sec/op     vs base               │
_BuildMetrics-12   110.4µ ± 1%   114.1µ ± 1%  +3.35% (p=0.000 n=10)

                 │  before.txt  │              after.txt               │
                 │     B/op     │     B/op      vs base                │
_BuildMetrics-12   4.344Ki ± 0%   3.797Ki ± 0%  -12.59% (p=0.000 n=10)

                 │ before.txt │             after.txt             │
                 │ allocs/op  │ allocs/op   vs base               │
_BuildMetrics-12   95.00 ± 0%   99.00 ± 0%  +4.21% (p=0.000 n=10)

Benchmark_NewPrometheusCollector:

                           │ before.txt  │             after.txt              │
                           │   sec/op    │   sec/op     vs base               │
_NewPrometheusCollector-12   154.8µ ± 1%   143.5µ ± 1%  -7.26% (p=0.000 n=10)

                           │  before.txt  │              after.txt              │
                           │     B/op     │     B/op      vs base               │
_NewPrometheusCollector-12   4.516Ki ± 0%   4.281Ki ± 0%  -5.19% (p=0.000 n=10)

                           │ before.txt │             after.txt              │
                           │ allocs/op  │ allocs/op   vs base                │
_NewPrometheusCollector-12   142.0 ± 0%   127.0 ± 0%  -10.56% (p=0.000 n=10)
cristiangreco commented 1 month ago
  1. Does this approach leave us open to duplicate labels that the map based implementation was hiding and will prometheus error/panic for duplicate labels?

Yes this edge case is very possible, e.g. if cloudwatch returns any duplicate dimension. I've pushed a change that validates the correct length of labels within EnsureLabelConsistencyAndRemoveDuplicates.