pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.8k stars 5.8k forks source link

Improve the performance of `Sort` by parallel #14417

Open wshwsh12 opened 4 years ago

wshwsh12 commented 4 years ago

Feature Request

Is your feature request related to a problem? Please describe:

Now the Sort executor is executed serially. We can run it parallelly to speed up.

Describe the feature you'd like:

Considering using a parallel framework, to run each sort partition in parallel. And then using a merge sort to generate the final result. We can refer to this proposal https://github.com/pingcap/tidb/pull/14238#issuecomment-569880893 and reuse it (after this pr merge).

Describe alternatives you've considered:

Task:

  1. Implement parallel sort.
    • [ ] Implement merge sort algorithm above Shuffle operator.
    • [ ] Implement plan builder for parallel sort.
  2. Cost Model, explain analyze information.
    • [ ] Change the cost model of sort if it will run parallelly.
    • [ ] Change the explain analyze result for parallel sort.

Teachability, Documentation, Adoption, Migration Strategy:

SunRunAway commented 4 years ago

ref: Multi-Core, Main-Memory Joins: Sortvs.Hash Revisited

These guys described a method by sorting in registers.