Open windtalker opened 5 days ago
After some investigations, I don't find obviously bugs that is releated to the executor's execution time. One possible cause of this distortion seems to be in PipelineExec::finalizeProfileInfo
https://github.com/pingcap/tiflash/blob/69dd6134e1adbbfa9f1562ca5ad5b91387bfc8a5/dbms/src/Flash/Pipeline/Exec/PipelineExec.cpp#L237-L250
It evenly distributes the task queue time to all the operators.
In this specific case, the TiDB's executor tree is TableScan => Selection => Projection => Aggregation The task queue time is about 28.32 seconds.
In pipemode, the executor tree is converted to a pipeline task that looks like
SourceOp(TableScan)=>FilterTransformOp=>ExpressionTransformOp(Selection)=>ExpressionTransformOp=>ExpressionTransfromOp(Projection)=>SinkOp(Aggregation)
The concurrency is 8. Now assuming all the task has the same queue time, then each task the queue time is 28.32/8 = 3.54s, for each op, the queue time is 3.54/6 = 0.59s If we remove the task queue time, then each executor's execution time will be HashAgg: 1.64s Projection: 1.53s Selection: 0.57s TableScan: 0.42s
Although still not the same as pull mode, the results is much better than the original one.
Actually, the output of Explain analyze
is designed with the assumption that the execution model is similar to the volcano execution model, while pipeline execution model is completely different from the volcano execution model, it is kind of expected that the execution time displayed in Explain analyze
maybe distortion. I think we need a comprehensive design of how to adapt the output of explain analyze in the pipeline model. cc @yibin87
Bug Report
Please answer these questions before submitting your issue. Thanks! Running the same query in pipeline mode and pull mode, the overall query peformance is almost the same, but the execution time for each executor differs a lot between pipeline mode and pull mode
In pipeline mode
HashAgg: 2.23s Projection: 2.71s Selection: 1.75s
Cpu flame graph
And in pull mode
HashAgg: 4.4s Projection: 2.8s Selection: 0.26s
Cpu flame graph
And compared with cpu flame graph, the executor time in pull mode is much more reasonable.
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
3. What did you see instead (Required)
4. What is your TiFlash version? (Required)