pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.49k stars 5.74k forks source link

improve the efficiency of intersect operator #27745

Open dbsid opened 2 years ago

dbsid commented 2 years ago

Enhancement

tpc-ds query 38 on 50G data set failed to execution on TiDB and cause OOM. Below are part of the query 38. the number of rows on two sides of the intersect is around two millions. currently the intersect need to go through CARTESIAN semi join, cause the data explosion and OOM. Suggest to handle intersect by a specific operator, which can check if all the columns are null values, to avoid using hash join CARTESIAN.

mysql> select count( distinct c_last_name, c_first_name, d_date)
    ->     from store_sales, date_dim, customer
    ->           where store_sales.ss_sold_date_sk = date_dim.d_date_sk
    ->       and store_sales.ss_customer_sk = customer.c_customer_sk
    ->       and d_month_seq between 1212 and 1212 + 11;
+----------------------------------------------------+
| count( distinct c_last_name, c_first_name, d_date) |
+----------------------------------------------------+
|                                            2203199 |
+----------------------------------------------------+
1 row in set (4.79 sec)

mysql>     select count(distinct c_last_name, c_first_name, d_date)
    ->     from catalog_sales, date_dim, customer
    ->           where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
    ->       and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
    ->       and d_month_seq between 1212 and 1212 + 11;
+---------------------------------------------------+
| count(distinct c_last_name, c_first_name, d_date) |
+---------------------------------------------------+
|                                           1480701 |
+---------------------------------------------------+

mysql> explain
    ->     select distinct c_last_name, c_first_name, d_date
    ->     from store_sales, date_dim, customer
    ->           where store_sales.ss_sold_date_sk = date_dim.d_date_sk
    ->       and store_sales.ss_customer_sk = customer.c_customer_sk
    ->       and d_month_seq between 1212 and 1212 + 11
    ->     intersect
    ->     select distinct c_last_name, c_first_name, d_date
    ->     from catalog_sales, date_dim, customer
    ->           where catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
    ->       and catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
    ->       and d_month_seq between 1212 and 1212 + 11;
+---------------------------------------------------------+--------------+-------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                                                      | estRows      | task              | access object       | operator info                                                                                                                                                                                                                                                                                                             |
+---------------------------------------------------------+--------------+-------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| HashJoin_26                                             | 4121.60      | root              |                     | CARTESIAN semi join, other cond:nulleq(tpcds50.customer.c_first_name, tpcds50.customer.c_first_name), nulleq(tpcds50.customer.c_last_name, tpcds50.customer.c_last_name), nulleq(tpcds50.date_dim.d_date, tpcds50.date_dim.d_date)                                                                                        |
| ├─TableReader_186(Build)                                | 5152.00      | root              |                     | data:ExchangeSender_185                                                                                                                                                                                                                                                                                                   |
| │ └─ExchangeSender_185                                  | 5152.00      | batchCop[tiflash] |                     | ExchangeType: PassThrough                                                                                                                                                                                                                                                                                                 |
| │   └─Projection_181                                    | 5152.00      | batchCop[tiflash] |                     | tpcds50.customer.c_last_name, tpcds50.customer.c_first_name, tpcds50.date_dim.d_date                                                                                                                                                                                                                                      |
| │     └─HashAgg_182                                     | 5152.00      | batchCop[tiflash] |                     | group by:tpcds50.customer.c_first_name, tpcds50.customer.c_last_name, tpcds50.date_dim.d_date, funcs:firstrow(tpcds50.customer.c_last_name)->tpcds50.customer.c_last_name, funcs:firstrow(tpcds50.customer.c_first_name)->tpcds50.customer.c_first_name, funcs:firstrow(tpcds50.date_dim.d_date)->tpcds50.date_dim.d_date |
| │       └─ExchangeReceiver_184                          | 5152.00      | batchCop[tiflash] |                     |                                                                                                                                                                                                                                                                                                                           |
| │         └─ExchangeSender_183                          | 5152.00      | batchCop[tiflash] |                     | ExchangeType: HashPartition, Hash Cols: [name: tpcds50.customer.c_last_name, collate: utf8mb4_bin], [name: tpcds50.customer.c_first_name, collate: utf8mb4_bin], [name: tpcds50.date_dim.d_date, collate: binary]                                                                                                         |
| │           └─HashAgg_112                               | 5152.00      | batchCop[tiflash] |                     | group by:tpcds50.customer.c_first_name, tpcds50.customer.c_last_name, tpcds50.date_dim.d_date,                                                                                                                                                                                                                            |
| │             └─Projection_180                          | 14342362.95  | batchCop[tiflash] |                     | tpcds50.date_dim.d_date, tpcds50.customer.c_first_name, tpcds50.customer.c_last_name                                                                                                                                                                                                                                      |
| │               └─HashJoin_174                          | 14342362.95  | batchCop[tiflash] |                     | inner join, equal:[eq(tpcds50.catalog_sales.cs_bill_customer_sk, tpcds50.customer.c_customer_sk)]                                                                                                                                                                                                                         |
| │                 ├─ExchangeReceiver_136(Build)         | 766000.00    | batchCop[tiflash] |                     |                                                                                                                                                                                                                                                                                                                           |
| │                 │ └─ExchangeSender_135                | 766000.00    | batchCop[tiflash] |                     | ExchangeType: Broadcast                                                                                                                                                                                                                                                                                                   |
| │                 │   └─TableFullScan_134               | 766000.00    | batchCop[tiflash] | table:customer      | keep order:false                                                                                                                                                                                                                                                                                                          |
| │                 └─HashJoin_127(Probe)                 | 14279151.70  | batchCop[tiflash] |                     | inner join, equal:[eq(tpcds50.date_dim.d_date_sk, tpcds50.catalog_sales.cs_sold_date_sk)]                                                                                                                                                                                                                                 |
| │                   ├─ExchangeReceiver_131(Build)       | 364.13       | batchCop[tiflash] |                     |                                                                                                                                                                                                                                                                                                                           |
| │                   │ └─ExchangeSender_130              | 364.13       | batchCop[tiflash] |                     | ExchangeType: Broadcast                                                                                                                                                                                                                                                                                                   |
| │                   │   └─Selection_129                 | 364.13       | batchCop[tiflash] |                     | ge(tpcds50.date_dim.d_month_seq, 1212), le(tpcds50.date_dim.d_month_seq, 1223)                                                                                                                                                                                                                                            |
| │                   │     └─TableFullScan_128           | 73049.00     | batchCop[tiflash] | table:date_dim      | keep order:false                                                                                                                                                                                                                                                                                                          |
| │                   └─Selection_133(Probe)              | 71306600.78  | batchCop[tiflash] |                     | not(isnull(tpcds50.catalog_sales.cs_bill_customer_sk)), not(isnull(tpcds50.catalog_sales.cs_sold_date_sk))                                                                                                                                                                                                                |
| │                     └─TableFullScan_132               | 71997669.00  | batchCop[tiflash] | table:catalog_sales | keep order:false                                                                                                                                                                                                                                                                                                          |
| └─TableReader_105(Probe)                                | 5152.00      | root              |                     | data:ExchangeSender_104                                                                                                                                                                                                                                                                                                   |
|   └─ExchangeSender_104                                  | 5152.00      | batchCop[tiflash] |                     | ExchangeType: PassThrough                                                                                                                                                                                                                                                                                                 |
|     └─Projection_100                                    | 5152.00      | batchCop[tiflash] |                     | tpcds50.customer.c_last_name, tpcds50.customer.c_first_name, tpcds50.date_dim.d_date                                                                                                                                                                                                                                      |
|       └─HashAgg_101                                     | 5152.00      | batchCop[tiflash] |                     | group by:tpcds50.customer.c_first_name, tpcds50.customer.c_last_name, tpcds50.date_dim.d_date, funcs:firstrow(tpcds50.customer.c_last_name)->tpcds50.customer.c_last_name, funcs:firstrow(tpcds50.customer.c_first_name)->tpcds50.customer.c_first_name, funcs:firstrow(tpcds50.date_dim.d_date)->tpcds50.date_dim.d_date |
|         └─ExchangeReceiver_103                          | 5152.00      | batchCop[tiflash] |                     |                                                                                                                                                                                                                                                                                                                           |
|           └─ExchangeSender_102                          | 5152.00      | batchCop[tiflash] |                     | ExchangeType: HashPartition, Hash Cols: [name: tpcds50.customer.c_last_name, collate: utf8mb4_bin], [name: tpcds50.customer.c_first_name, collate: utf8mb4_bin], [name: tpcds50.date_dim.d_date, collate: binary]                                                                                                         |
|             └─HashAgg_31                                | 5152.00      | batchCop[tiflash] |                     | group by:tpcds50.customer.c_first_name, tpcds50.customer.c_last_name, tpcds50.date_dim.d_date,                                                                                                                                                                                                                            |
|               └─Projection_99                           | 28891170.73  | batchCop[tiflash] |                     | tpcds50.date_dim.d_date, tpcds50.customer.c_first_name, tpcds50.customer.c_last_name                                                                                                                                                                                                                                      |
|                 └─HashJoin_93                           | 28891170.73  | batchCop[tiflash] |                     | inner join, equal:[eq(tpcds50.store_sales.ss_customer_sk, tpcds50.customer.c_customer_sk)]                                                                                                                                                                                                                                |
|                   ├─ExchangeReceiver_55(Build)          | 766000.00    | batchCop[tiflash] |                     |                                                                                                                                                                                                                                                                                                                           |
|                   │ └─ExchangeSender_54                 | 766000.00    | batchCop[tiflash] |                     | ExchangeType: Broadcast                                                                                                                                                                                                                                                                                                   |
|                   │   └─TableFullScan_53                | 766000.00    | batchCop[tiflash] | table:customer      | keep order:false                                                                                                                                                                                                                                                                                                          |
|                   └─HashJoin_46(Probe)                  | 28763838.36  | batchCop[tiflash] |                     | inner join, equal:[eq(tpcds50.date_dim.d_date_sk, tpcds50.store_sales.ss_sold_date_sk)]                                                                                                                                                                                                                                   |
|                     ├─ExchangeReceiver_50(Build)        | 364.13       | batchCop[tiflash] |                     |                                                                                                                                                                                                                                                                                                                           |
|                     │ └─ExchangeSender_49               | 364.13       | batchCop[tiflash] |                     | ExchangeType: Broadcast                                                                                                                                                                                                                                                                                                   |
|                     │   └─Selection_48                  | 364.13       | batchCop[tiflash] |                     | ge(tpcds50.date_dim.d_month_seq, 1212), le(tpcds50.date_dim.d_month_seq, 1223)                                                                                                                                                                                                                                            |
|                     │     └─TableFullScan_47            | 73049.00     | batchCop[tiflash] | table:date_dim      | keep order:false                                                                                                                                                                                                                                                                                                          |
|                     └─Selection_52(Probe)               | 131395392.13 | batchCop[tiflash] |                     | not(isnull(tpcds50.store_sales.ss_customer_sk)), not(isnull(tpcds50.store_sales.ss_sold_date_sk))                                                                                                                                                                                                                         |
|                       └─TableFullScan_51                | 144004764.00 | batchCop[tiflash] | table:store_sales   | keep order:false                                                                                                                                                                                                                                                                                                          |
+---------------------------------------------------------+--------------+-------------------+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
39 rows in set, 3 warnings (0.01 sec)
longlongago7777 commented 2 years ago

Hello, guys, Any update on this issue? I am running the POC test for ChinaUnioncom and did not get the results for 6 hours, and it did not OOM, the plan is attached. query_38.sqlplanexplain.txt

yudongusa commented 2 years ago

This is an enhancement for AP and we would probably start to address optimizations like this later this year.