thanos-io / promql-engine

Multi-threaded PromQL engine implementation based on the Volcano paper.
Apache License 2.0
141 stars 54 forks source link

engine,execution: refactor operator telemetry #371

Closed MichaHoffmann closed 9 months ago

MichaHoffmann commented 9 months ago

Instead of every operator needing to implement the OperatorTelemetry interface we just wrap and render analysis by piggybacking of Explain. This also has the benefit of reusing Explains node names.

=== RUN   TestQueryAnalyze/rate(http_requests_total[30s])_>_bool_0/range
    explain_test.go:160: Query: rate(http_requests_total[30s]) > bool 0
        Analysis:
        [*scalarOperator] > - 182.797µs:
        ├──[*coalesce] - 145.549µs:
        │  ├──[*concurrencyOperator(buff=2)] - 31.921µs:
        │  │  └──[*matrixSelector] rate({[__name__="http_requests_total"]}[30s] 0 mod 4) - 22.881µs
        │  ├──[*concurrencyOperator(buff=2)] - 46.706µs:
        │  │  └──[*matrixSelector] rate({[__name__="http_requests_total"]}[30s] 1 mod 4) - 38.499µs
        │  ├──[*concurrencyOperator(buff=2)] - 37.865µs:
        │  │  └──[*matrixSelector] rate({[__name__="http_requests_total"]}[30s] 2 mod 4) - 22.315µs
        │  └──[*concurrencyOperator(buff=2)] - 34.638µs:
        │     └──[*matrixSelector] rate({[__name__="http_requests_total"]}[30s] 3 mod 4) - 21.86µs
        └──[*numberLiteralSelector] 0 - 14.998µs

TODO:

MichaHoffmann commented 9 months ago

How will this facilitate capturing custom telemetry data that is not applicable to all operators i.e. how we will be able to add telemetry data from inside the operators themselves? My point is that the operators need to be aware of the telemetry implementation. Having a wrapper makes life easier but I think it would be better to only have one way of implementing telemetry.

Yeah this method only allows capturing telemetry that is applicable to all operators like this; is this a dealbreaker?

GiedriusS commented 9 months ago

Maybe we can revert the removal of telemetry embedding and keep this wrapper only for CPU wall time? I think realistically this wrapper can only capture this info

MichaHoffmann commented 9 months ago

Maybe we can revert the removal of telemetry embedding and keep this wrapper only for CPU wall time? I think realistically this wrapper can only capture this info

Lets close this PR! While working on it i found that subquery telemetry is broken i think though ( the nested options never get the "EnableAnalysis" flag; ill open followup to fix that )