pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.93k stars 5.81k forks source link

Refine Index Join #8470

Closed zz-jason closed 4 years ago

zz-jason commented 5 years ago

Feature Request

At present, the Index Join implementation is not efficient at some scenarios:

  1. It may cause TiDB OOM because it uses the inner table to construct the hash table
  2. It can not response to the parent in a short period, because it has to wait to all the inner rows matched the outer join key to be fetched out from TiKV and have build hash table on it, and do the join operation on the main thread.
  3. The execution is not efficient, because all the join work are performed in the main thread, the outer and inner workers are only responsible for fetching data from TiKV

Describe the feature you'd like:

Split Index Join into two operators:

  1. One for keep order. In this operator, the output of the Index Join should be ordered by the outer join key. We can do a Merge Join on a task
  2. One for no need to keep order. In this operator, the output of the Index Join can have arbitrary order. In order to limit the memory consumption, we can use the outer rows inside a task to build the hash table and do hash join on the fetched inner rows, return a Chunk as soon as possible.

Describe alternatives you've considered:

No

Teachability, Documentation, Adoption, Migration Strategy:

After discussing offline, @yu34po will work on this issue.

yu34po commented 5 years ago

will fix the index join featuer in 3 PRs 1.add indexhashjoin for non order by query, Reserved old inderlookupjoin for order by query https://github.com/pingcap/tidb/pull/8661 2.add new joiner to fix column order with trytomatch/onmissmatch problems of indexhashjoin 3.add indexmergejoin to slove order by situation

XuHuaiyu commented 5 years ago

index hash join:

index merge join: