thanos-io / promql-engine

Multi-threaded PromQL engine implementation based on the Volcano paper.
Apache License 2.0
141 stars 54 forks source link

Add sample statistics to operator telemetry #419

Closed pedro-stanaka closed 6 months ago

pedro-stanaka commented 6 months ago

Summary

Would be nice to get an idea of how many samples are being loaded to answer queries, in this PR I added some information about loaded samples for each operator using the existing Prometheus stats.QuerySamples model. This will allow the compatibilityQuery to play nicely with the upstream API and implement at least part of the Stats() method.

Bench results (against main)

Toggle me! `new.out` is `main` ``` goos: darwin goarch: arm64 pkg: github.com/thanos-io/promql-engine/engine │ benchmarks/new.out │ benchmarks/new_samples.out │ │ sec/op │ sec/op vs base │ RangeQuery/vector_selector-11 12.05m ± ∞ ¹ 11.54m ± ∞ ¹ ~ (p=0.095 n=5) RangeQuery/sum-11 7.985m ± ∞ ¹ 8.367m ± ∞ ¹ ~ (p=0.690 n=5) RangeQuery/sum_by_pod-11 13.27m ± ∞ ¹ 12.91m ± ∞ ¹ ~ (p=0.222 n=5) RangeQuery/topk-11 8.286m ± ∞ ¹ 8.013m ± ∞ ¹ -3.30% (p=0.016 n=5) RangeQuery/bottomk-11 7.959m ± ∞ ¹ 8.168m ± ∞ ¹ +2.62% (p=0.008 n=5) RangeQuery/rate-11 13.57m ± ∞ ¹ 13.75m ± ∞ ¹ ~ (p=0.310 n=5) RangeQuery/subquery-11 31.20m ± ∞ ¹ 31.38m ± ∞ ¹ ~ (p=0.310 n=5) RangeQuery/sum_rate-11 10.44m ± ∞ ¹ 10.45m ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/sum_by_rate-11 13.37m ± ∞ ¹ 13.36m ± ∞ ¹ ~ (p=1.000 n=5) RangeQuery/quantile_with_variable_parameter-11 27.61m ± ∞ ¹ 27.98m ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/binary_operation_with_one_to_one-11 9.103m ± ∞ ¹ 9.244m ± ∞ ¹ +1.55% (p=0.008 n=5) RangeQuery/binary_operation_with_many_to_one-11 23.55m ± ∞ ¹ 23.95m ± ∞ ¹ +1.69% (p=0.008 n=5) RangeQuery/binary_operation_with_vector_and_scalar-11 16.17m ± ∞ ¹ 16.17m ± ∞ ¹ ~ (p=0.841 n=5) RangeQuery/unary_negation-11 12.06m ± ∞ ¹ 12.32m ± ∞ ¹ +2.15% (p=0.016 n=5) RangeQuery/vector_and_scalar_comparison-11 16.43m ± ∞ ¹ 16.64m ± ∞ ¹ ~ (p=0.222 n=5) RangeQuery/positive_offset_vector-11 11.06m ± ∞ ¹ 11.31m ± ∞ ¹ ~ (p=0.421 n=5) RangeQuery/at_modifier_-11 8.102m ± ∞ ¹ 8.197m ± ∞ ¹ ~ (p=0.841 n=5) RangeQuery/at_modifier_with_positive_offset_vector-11 8.064m ± ∞ ¹ 8.298m ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/clamp-11 15.22m ± ∞ ¹ 15.58m ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/clamp_min-11 13.61m ± ∞ ¹ 15.73m ± ∞ ¹ +15.61% (p=0.016 n=5) RangeQuery/complex_func_query-11 19.33m ± ∞ ¹ 20.76m ± ∞ ¹ +7.42% (p=0.008 n=5) RangeQuery/func_within_func_query-11 17.33m ± ∞ ¹ 17.72m ± ∞ ¹ ~ (p=0.690 n=5) RangeQuery/aggr_within_func_query-11 17.75m ± ∞ ¹ 17.68m ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/histogram_quantile-11 79.68m ± ∞ ¹ 73.22m ± ∞ ¹ -8.11% (p=0.016 n=5) RangeQuery/sort-11 12.75m ± ∞ ¹ 13.54m ± ∞ ¹ ~ (p=0.095 n=5) RangeQuery/sort_desc-11 12.90m ± ∞ ¹ 13.61m ± ∞ ¹ ~ (p=0.690 n=5) RangeQuery/absent_and_exists-11 7.879m ± ∞ ¹ 6.909m ± ∞ ¹ ~ (p=0.095 n=5) RangeQuery/absent_and_doesnt_exist-11 280.8µ ± ∞ ¹ 273.9µ ± ∞ ¹ ~ (p=0.222 n=5) NativeHistograms/selector-11 91.13m ± ∞ ¹ 86.82m ± ∞ ¹ ~ (p=0.222 n=5) NativeHistograms/sum-11 132.0m ± ∞ ¹ 130.4m ± ∞ ¹ ~ (p=0.056 n=5) NativeHistograms/rate-11 119.1m ± ∞ ¹ 113.0m ± ∞ ¹ -5.08% (p=0.008 n=5) NativeHistograms/sum_rate-11 158.0m ± ∞ ¹ 151.8m ± ∞ ¹ -3.91% (p=0.016 n=5) NativeHistograms/histogram_sum-11 301.9m ± ∞ ¹ 303.7m ± ∞ ¹ ~ (p=1.000 n=5) NativeHistograms/histogram_count-11 303.5m ± ∞ ¹ 321.3m ± ∞ ¹ +5.87% (p=0.008 n=5) NativeHistograms/histogram_quantile-11 141.3m ± ∞ ¹ 151.9m ± ∞ ¹ ~ (p=0.151 n=5) NativeHistograms/histogram_scalar_binop-11 219.0m ± ∞ ¹ 228.5m ± ∞ ¹ +4.37% (p=0.008 n=5) geomean 21.81m 21.99m +0.80% ¹ need >= 6 samples for confidence interval at level 0.95 │ benchmarks/new.out │ benchmarks/new_samples.out │ │ B/op │ B/op vs base │ RangeQuery/vector_selector-11 25.56Mi ± ∞ ¹ 25.58Mi ± ∞ ¹ ~ (p=0.151 n=5) RangeQuery/sum-11 6.249Mi ± ∞ ¹ 6.253Mi ± ∞ ¹ ~ (p=0.421 n=5) RangeQuery/sum_by_pod-11 13.23Mi ± ∞ ¹ 13.24Mi ± ∞ ¹ ~ (p=0.841 n=5) RangeQuery/topk-11 8.972Mi ± ∞ ¹ 8.998Mi ± ∞ ¹ +0.29% (p=0.008 n=5) RangeQuery/bottomk-11 8.961Mi ± ∞ ¹ 8.983Mi ± ∞ ¹ +0.24% (p=0.008 n=5) RangeQuery/rate-11 26.79Mi ± ∞ ¹ 26.79Mi ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/subquery-11 29.52Mi ± ∞ ¹ 29.51Mi ± ∞ ¹ ~ (p=0.310 n=5) RangeQuery/sum_rate-11 9.358Mi ± ∞ ¹ 9.262Mi ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/sum_by_rate-11 17.53Mi ± ∞ ¹ 17.56Mi ± ∞ ¹ ~ (p=0.056 n=5) RangeQuery/quantile_with_variable_parameter-11 30.12Mi ± ∞ ¹ 30.12Mi ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/binary_operation_with_one_to_one-11 14.20Mi ± ∞ ¹ 14.21Mi ± ∞ ¹ ~ (p=1.000 n=5) RangeQuery/binary_operation_with_many_to_one-11 34.66Mi ± ∞ ¹ 34.72Mi ± ∞ ¹ ~ (p=1.000 n=5) RangeQuery/binary_operation_with_vector_and_scalar-11 30.69Mi ± ∞ ¹ 30.70Mi ± ∞ ¹ ~ (p=0.310 n=5) RangeQuery/unary_negation-11 28.17Mi ± ∞ ¹ 28.16Mi ± ∞ ¹ ~ (p=1.000 n=5) RangeQuery/vector_and_scalar_comparison-11 30.16Mi ± ∞ ¹ 30.15Mi ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/positive_offset_vector-11 26.27Mi ± ∞ ¹ 26.28Mi ± ∞ ¹ ~ (p=0.151 n=5) RangeQuery/at_modifier_-11 22.67Mi ± ∞ ¹ 22.67Mi ± ∞ ¹ ~ (p=0.151 n=5) RangeQuery/at_modifier_with_positive_offset_vector-11 22.48Mi ± ∞ ¹ 22.48Mi ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/clamp-11 27.76Mi ± ∞ ¹ 27.78Mi ± ∞ ¹ ~ (p=0.095 n=5) RangeQuery/clamp_min-11 27.73Mi ± ∞ ¹ 27.75Mi ± ∞ ¹ ~ (p=0.421 n=5) RangeQuery/complex_func_query-11 31.34Mi ± ∞ ¹ 31.42Mi ± ∞ ¹ ~ (p=0.095 n=5) RangeQuery/func_within_func_query-11 29.13Mi ± ∞ ¹ 29.14Mi ± ∞ ¹ +0.03% (p=0.008 n=5) RangeQuery/aggr_within_func_query-11 29.14Mi ± ∞ ¹ 29.14Mi ± ∞ ¹ ~ (p=0.841 n=5) RangeQuery/histogram_quantile-11 48.97Mi ± ∞ ¹ 48.95Mi ± ∞ ¹ ~ (p=1.000 n=5) RangeQuery/sort-11 27.11Mi ± ∞ ¹ 27.12Mi ± ∞ ¹ ~ (p=0.548 n=5) RangeQuery/sort_desc-11 27.10Mi ± ∞ ¹ 27.11Mi ± ∞ ¹ ~ (p=0.095 n=5) RangeQuery/absent_and_exists-11 8.581Mi ± ∞ ¹ 8.151Mi ± ∞ ¹ -5.02% (p=0.016 n=5) RangeQuery/absent_and_doesnt_exist-11 570.9Ki ± ∞ ¹ 570.8Ki ± ∞ ¹ ~ (p=0.841 n=5) NativeHistograms/selector-11 415.0Mi ± ∞ ¹ 415.0Mi ± ∞ ¹ ~ (p=0.690 n=5) NativeHistograms/sum-11 397.4Mi ± ∞ ¹ 397.4Mi ± ∞ ¹ ~ (p=0.310 n=5) NativeHistograms/rate-11 370.7Mi ± ∞ ¹ 370.7Mi ± ∞ ¹ ~ (p=0.421 n=5) NativeHistograms/sum_rate-11 353.2Mi ± ∞ ¹ 353.2Mi ± ∞ ¹ ~ (p=0.548 n=5) NativeHistograms/histogram_sum-11 416.3Mi ± ∞ ¹ 416.4Mi ± ∞ ¹ ~ (p=0.421 n=5) NativeHistograms/histogram_count-11 416.3Mi ± ∞ ¹ 416.3Mi ± ∞ ¹ ~ (p=1.000 n=5) NativeHistograms/histogram_quantile-11 411.7Mi ± ∞ ¹ 397.5Mi ± ∞ ¹ ~ (p=0.222 n=5) NativeHistograms/histogram_scalar_binop-11 581.8Mi ± ∞ ¹ 582.1Mi ± ∞ ¹ ~ (p=0.056 n=5) geomean 37.24Mi 37.16Mi -0.22% ¹ need >= 6 samples for confidence interval at level 0.95 │ benchmarks/new.out │ benchmarks/new_samples.out │ │ allocs/op │ allocs/op vs base │ RangeQuery/vector_selector-11 49.09k ± ∞ ¹ 49.10k ± ∞ ¹ ~ (p=0.056 n=5) RangeQuery/sum-11 47.61k ± ∞ ¹ 47.61k ± ∞ ¹ ~ (p=0.730 n=5) RangeQuery/sum_by_pod-11 66.48k ± ∞ ¹ 66.48k ± ∞ ¹ ~ (p=0.460 n=5) RangeQuery/topk-11 44.76k ± ∞ ¹ 44.77k ± ∞ ¹ +0.03% (p=0.016 n=5) RangeQuery/bottomk-11 44.75k ± ∞ ¹ 44.76k ± ∞ ¹ ~ (p=0.508 n=5) RangeQuery/rate-11 64.11k ± ∞ ¹ 64.09k ± ∞ ¹ -0.03% (p=0.032 n=5) RangeQuery/subquery-11 84.40k ± ∞ ¹ 84.38k ± ∞ ¹ -0.03% (p=0.008 n=5) RangeQuery/sum_rate-11 92.35k ± ∞ ¹ 92.33k ± ∞ ¹ ~ (p=0.841 n=5) RangeQuery/sum_by_rate-11 112.1k ± ∞ ¹ 112.2k ± ∞ ¹ +0.06% (p=0.008 n=5) RangeQuery/quantile_with_variable_parameter-11 450.5k ± ∞ ¹ 450.6k ± ∞ ¹ +0.02% (p=0.032 n=5) RangeQuery/binary_operation_with_one_to_one-11 63.85k ± ∞ ¹ 64.00k ± ∞ ¹ +0.23% (p=0.008 n=5) RangeQuery/binary_operation_with_many_to_one-11 121.0k ± ∞ ¹ 121.3k ± ∞ ¹ +0.21% (p=0.008 n=5) RangeQuery/binary_operation_with_vector_and_scalar-11 93.83k ± ∞ ¹ 93.90k ± ∞ ¹ +0.07% (p=0.008 n=5) RangeQuery/unary_negation-11 92.54k ± ∞ ¹ 92.60k ± ∞ ¹ +0.06% (p=0.008 n=5) RangeQuery/vector_and_scalar_comparison-11 84.81k ± ∞ ¹ 84.86k ± ∞ ¹ +0.05% (p=0.008 n=5) RangeQuery/positive_offset_vector-11 69.03k ± ∞ ¹ 69.08k ± ∞ ¹ +0.07% (p=0.008 n=5) RangeQuery/at_modifier_-11 54.39k ± ∞ ¹ 54.41k ± ∞ ¹ +0.03% (p=0.008 n=5) RangeQuery/at_modifier_with_positive_offset_vector-11 48.39k ± ∞ ¹ 48.41k ± ∞ ¹ +0.04% (p=0.008 n=5) RangeQuery/clamp-11 93.37k ± ∞ ¹ 93.45k ± ∞ ¹ +0.09% (p=0.008 n=5) RangeQuery/clamp_min-11 92.95k ± ∞ ¹ 93.02k ± ∞ ¹ +0.08% (p=0.008 n=5) RangeQuery/complex_func_query-11 103.8k ± ∞ ¹ 103.9k ± ∞ ¹ +0.08% (p=0.016 n=5) RangeQuery/func_within_func_query-11 108.6k ± ∞ ¹ 108.6k ± ∞ ¹ +0.04% (p=0.008 n=5) RangeQuery/aggr_within_func_query-11 108.6k ± ∞ ¹ 108.6k ± ∞ ¹ +0.02% (p=0.008 n=5) RangeQuery/histogram_quantile-11 587.8k ± ∞ ¹ 587.7k ± ∞ ¹ -0.02% (p=0.008 n=5) RangeQuery/sort-11 83.51k ± ∞ ¹ 83.58k ± ∞ ¹ +0.08% (p=0.008 n=5) RangeQuery/sort_desc-11 83.50k ± ∞ ¹ 83.57k ± ∞ ¹ +0.09% (p=0.008 n=5) RangeQuery/absent_and_exists-11 77.45k ± ∞ ¹ 77.25k ± ∞ ¹ ~ (p=0.056 n=5) RangeQuery/absent_and_doesnt_exist-11 2.872k ± ∞ ¹ 2.868k ± ∞ ¹ -0.14% (p=0.008 n=5) NativeHistograms/selector-11 5.197M ± ∞ ¹ 5.197M ± ∞ ¹ ~ (p=0.690 n=5) NativeHistograms/sum-11 5.193M ± ∞ ¹ 5.193M ± ∞ ¹ ~ (p=1.000 n=5) NativeHistograms/rate-11 7.558M ± ∞ ¹ 7.558M ± ∞ ¹ ~ (p=1.000 n=5) NativeHistograms/sum_rate-11 7.554M ± ∞ ¹ 7.554M ± ∞ ¹ -0.00% (p=0.008 n=5) NativeHistograms/histogram_sum-11 5.207M ± ∞ ¹ 5.207M ± ∞ ¹ ~ (p=0.841 n=5) NativeHistograms/histogram_count-11 5.207M ± ∞ ¹ 5.207M ± ∞ ¹ ~ (p=0.690 n=5) NativeHistograms/histogram_quantile-11 5.263M ± ∞ ¹ 5.194M ± ∞ ¹ ~ (p=0.151 n=5) NativeHistograms/histogram_scalar_binop-11 8.882M ± ∞ ¹ 8.882M ± ∞ ¹ ~ (p=0.421 n=5) geomean 204.5k 204.5k -0.01% ¹ need >= 6 samples for confidence interval at level 0.95 ```
pedro-stanaka commented 6 months ago

Hm, do we want to track processed sample for each operator, or only for scanners?

I thought of adding to operators because in some cases you lose context of usage, like in Execution operator.

fpetkovski commented 6 months ago

Maybe we can run benchmarks to see if counting each sample individually is going to have a significant perf penalty.

pedro-stanaka commented 6 months ago

Maybe we can run benchmarks to see if counting each sample individually is going to have a significant perf penalty.

The change is pretty minimal, you can check the benchmarks on the PR description.

GiedriusS commented 6 months ago

@fpetkovski do you have any extra comments/suggestions?

yeya24 commented 6 months ago

Nice work @fpetkovski @pedro-stanaka. Another question, we can also support the max samples limit on top of this feature?