Open wshwsh12 opened 4 years ago
Seems the development for this feature is on hold. For analytical query, this feature is critical for performance improvement.
For example the plan snippet from TPCDS query71.sql, for the hash join between date_dim and store_sales, the majority of elapsed time is on the TableReader and the following HashJoin. If bloom filter pushdown(to TiFlash or TiKV) is implemented. The actual rows the TiDB need to read from TiFlash maybe reduced from 2880404 to nearly 92752, which might improve the hash join performance dramatically.
| │ │ └─HashJoin_114 | 0.00 | 92752 | root | | time:1.568344807s, loops:94, build_hash_table:{total:75.468632ms, fetch:75.451881ms, build:16.751µs}, probe:{concurrency:5, total:7.830360633s, max:1.585215382s, probe:1.08864604s, fetch:6.741714593s} | inner join, equal:[eq(tpcds.date_dim.d_date_sk, tpcds.store_sales.ss_sold_date_sk)] | 24.703125 KB | 0 Bytes |
| │ │ ├─TableReader_130(Build) | 0.00 | 31 | root | | time:75.418557ms, loops:2, cop_task: {num: 1, max:75.762799ms, proc_keys: 0, rpc_num: 1, rpc_time: 75.758431ms, copr_cache_hit_ratio: 0.00} | data:Selection_129 | 968 Bytes | N/A |
| │ │ │ └─Selection_129 | 0.00 | 31 | cop[tiflash] | | time:15.038987ms, loops:1 | eq(tpcds.date_dim.d_moy, 12), eq(tpcds.date_dim.d_year, 2000) | N/A | N/A |
| │ │ │ └─TableFullScan_128 | 73049.00 | 73049 | cop[tiflash] | table:date_dim | time:15.038987ms, loops:2 | keep order:false | N/A | N/A |
| │ │ └─TableReader_120(Probe) | 2880404.00 | 2880404 | root | | time:1.335566026s, loops:2818, cop_task: {num: 6, max: 1.534957884s, min: 1.024621609s, avg: 1.29958119s, p95: 1.534957884s, rpc_num: 6, rpc_time: 7.797425596s, copr_cache_hit_ratio: 0.00} | data:Selection_119 | 66.03593063354492 MB | N/A |
| │ │ └─Selection_119 | 2880404.00 | 2880404 | cop[tiflash] | | proc max:303.179484ms, min:104.655019ms, p80:262.17598ms, p95:303.179484ms, iters:46, tasks:6 | not(isnull(tpcds.store_sales.ss_item_sk)), not(isnull(tpcds.store_sales.ss_sold_date_sk)), not(isnull(tpcds.store_sales.ss_sold_time_sk)) | N/A | N/A |
| │ │ └─TableFullScan_118 | 2880404.00 | 2880404 | cop[tiflash] | table:store_sales | proc max:290.179245ms, min:62.254286ms, p80:164.174192ms, p95:290.179245ms, iters:46, tasks:6 | keep order:false | N/A | N/A |
Feature Request
Is your feature request related to a problem? Please describe:
Join exectuor‘s performance can be optimized. We can use bloom filter to filter the data from join's probe side to reduce network overhead and computation overhead.
Describe the feature you'd like:
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
Paper : Looking Ahead Makes Query Plans Robust http://www.vldb.org/pvldb/vol10/p889-zhu.pdf