pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.28k stars 5.84k forks source link

partition pruning not effective on date scenarios #19941

Open ghost opened 4 years ago

ghost commented 4 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
 d date NOT NULL
) PARTITION BY RANGE (YEAR(d)) (
 PARTITION p2016 VALUES LESS THAN (2017),
 PARTITION p2017 VALUES LESS THAN (2018),
 PARTITION p2018 VALUES LESS THAN (2019),
 PARTITION p2019 VALUES LESS THAN (2020),
 PARTITION pmax VALUES LESS THAN MAXVALUE
);

INSERT INTO t1 VALUES 
 ('2016-01-01'),
 ('2016-06-01'),
 ('2016-09-01'),
 ('2017-01-01'),
 ('2017-06-01'),
 ('2017-09-01'),
 ('2018-01-01'),
 ('2018-06-01'),
 ('2018-09-01'),
 ('2019-01-01'),
 ('2019-06-01'),
 ('2019-09-01'),
 ('2020-01-01'),
 ('2020-06-01'),
 ('2020-09-01');

EXPLAIN SELECT * FROM t1 WHERE d BETWEEN '2018-01-01 00:00' AND '2018-12-31 23:59:59';
EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';

2. What did you expect to see? (Required)

In MySQL (see "partitions" column):

mysql [localhost:5731] {msandbox} (test) > EXPLAIN SELECT * FROM t1 WHERE d BETWEEN '2018-01-01 00:00' AND '2018-12-31 23:59:59';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | p2018      | ALL  | NULL          | NULL | NULL    | NULL |    3 |    33.33 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:5731] {msandbox} (test) > EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | p2018      | ALL  | NULL          | NULL | NULL    | NULL |    3 |    33.33 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

3. What did you see instead (Required)

In TiDB (see "access object"):

mysql> EXPLAIN SELECT * FROM t1 WHERE d BETWEEN '2018-01-01 00:00' AND '2018-12-31 23:59:59';
+-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
| id                      | estRows  | task      | access object | operator info                                                                        |
+-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
| TableReader_7           | 250.00   | root      | partition:all | data:Selection_6                                                                     |
| └─Selection_6           | 250.00   | cop[tikv] |               | ge(test.t1.d, 2018-01-01 00:00:00.000000), le(test.t1.d, 2018-12-31 23:59:59.000000) |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                       |
+-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';
+------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
| id                           | estRows  | task      | access object | operator info                                                                        |
+------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
| StreamAgg_20                 | 1.00     | root      |               | funcs:count(Column#5)->Column#3                                                      |
| └─TableReader_21             | 1.00     | root      | partition:all | data:StreamAgg_9                                                                     |
|   └─StreamAgg_9              | 1.00     | cop[tikv] |               | funcs:count(1)->Column#5                                                             |
|     └─Selection_19           | 250.00   | cop[tikv] |               | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2019-01-01 00:00:00.000000) |
|       └─TableFullScan_18     | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                       |
+------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

4. What is your TiDB version? (Required)

mysql> SELECT tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v4.0.0-beta.2-1194-gc022fdcae
Edition: Community
Git Commit Hash: c022fdcaebd311f59f5c179b2a6a2f7e808a3340
Git Branch: master
UTC Build Time: 2020-09-09 12:13:26
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec)
ghost commented 4 years ago

@imtbkcat @tiancaiamao PTAL

sre-bot commented 4 years ago

Integrity check: component severity RCA symptom trigger_condition affect_version fix_version fields are empty @imtbkcat Please comment /info to get template

ghost commented 4 years ago

This looks mostly fixed, with the exception of TiDB needing to access {p2018, p2019}. MySQL can prune this to just p2018 - is there an off-by-one error in pruning?

mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';
+------------------------------------+----------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| id                                 | estRows  | task      | access object             | operator info                                                                        |
+------------------------------------+----------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| HashAgg_15                         | 1.00     | root      |                           | funcs:count(Column#4)->Column#3                                                      |
| └─PartitionUnion_16                | 2.00     | root      |                           |                                                                                      |
|   ├─StreamAgg_31                   | 1.00     | root      |                           | funcs:count(Column#6)->Column#4                                                      |
|   │ └─TableReader_32               | 1.00     | root      |                           | data:StreamAgg_20                                                                    |
|   │   └─StreamAgg_20               | 1.00     | cop[tikv] |                           | funcs:count(1)->Column#6                                                             |
|   │     └─Selection_30             | 250.00   | cop[tikv] |                           | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2019-01-01 00:00:00.000000) |
|   │       └─TableFullScan_29       | 10000.00 | cop[tikv] | table:t1, partition:p2018 | keep order:false, stats:pseudo                                                       |
|   └─StreamAgg_50                   | 1.00     | root      |                           | funcs:count(Column#8)->Column#4                                                      |
|     └─TableReader_51               | 1.00     | root      |                           | data:StreamAgg_39                                                                    |
|       └─StreamAgg_39               | 1.00     | cop[tikv] |                           | funcs:count(1)->Column#8                                                             |
|         └─Selection_49             | 250.00   | cop[tikv] |                           | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2019-01-01 00:00:00.000000) |
|           └─TableFullScan_48       | 10000.00 | cop[tikv] | table:t1, partition:p2019 | keep order:false, stats:pseudo                                                       |
+------------------------------------+----------+-----------+---------------------------+--------------------------------------------------------------------------------------+
12 rows in set (0.00 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2018-12-31';
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| id                         | estRows | task      | access object             | operator info                                                                        |
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| StreamAgg_12               | 1.00    | root      |                           | funcs:count(1)->Column#3                                                             |
| └─TableReader_19           | 0.07    | root      |                           | data:Selection_18                                                                    |
|   └─Selection_18           | 0.07    | cop[tikv] |                           | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2018-12-31 00:00:00.000000) |
|     └─TableFullScan_17     | 3.00    | cop[tikv] | table:t1, partition:p2018 | keep order:false, stats:pseudo                                                       |
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
4 rows in set (0.01 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d <= '2018-12-31';
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| id                         | estRows | task      | access object             | operator info                                                                        |
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
| StreamAgg_12               | 1.00    | root      |                           | funcs:count(1)->Column#3                                                             |
| └─TableReader_19           | 0.07    | root      |                           | data:Selection_18                                                                    |
|   └─Selection_18           | 0.07    | cop[tikv] |                           | ge(test.t1.d, 2018-01-01 00:00:00.000000), le(test.t1.d, 2018-12-31 00:00:00.000000) |
|     └─TableFullScan_17     | 3.00    | cop[tikv] | table:t1, partition:p2018 | keep order:false, stats:pseudo                                                       |
+----------------------------+---------+-----------+---------------------------+--------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)
seiya-annie commented 4 years ago

/info

ti-srebot commented 4 years ago

Please edit this comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added Example for how to fill out the template: https://github.com/pingcap/tidb/issues/20100

1. Root Cause Analysis (RCA) (optional)

2. Symptom (optional)

3. All Trigger Conditions (optional)

4. Workaround (optional)

5. Affected versions

6. Fixed versions

mjonss commented 2 years ago

Same with dynamic prune mode:

tidb> set @@session.tidb_partition_prune_mode = dynamic;
Query OK, 0 rows affected (0,00 sec)

tidb> EXPLAIN SELECT COUNT(*) FROM t1 WHERE d >= '2018-01-01' AND d < '2019-01-01';
+------------------------------+----------+-----------+-----------------------+--------------------------------------------------------------------------------------+
| id                           | estRows  | task      | access object         | operator info                                                                        |
+------------------------------+----------+-----------+-----------------------+--------------------------------------------------------------------------------------+
| StreamAgg_20                 | 1.00     | root      |                       | funcs:count(Column#5)->Column#3                                                      |
| └─TableReader_21             | 1.00     | root      | partition:p2018,p2019 | data:StreamAgg_9                                                                     |
|   └─StreamAgg_9              | 1.00     | cop[tikv] |                       | funcs:count(1)->Column#5                                                             |
|     └─Selection_19           | 250.00   | cop[tikv] |                       | ge(test.t1.d, 2018-01-01 00:00:00.000000), lt(test.t1.d, 2019-01-01 00:00:00.000000) |
|       └─TableFullScan_18     | 10000.00 | cop[tikv] | table:t1              | keep order:false, stats:pseudo                                                       |
+------------------------------+----------+-----------+-----------------------+--------------------------------------------------------------------------------------+
5 rows in set (0,00 sec)
mjonss commented 2 years ago

The difference between MySQL and TiDB is that MySQL will check the 'transitions' for YEAR() function, so MySQL will prune correctly for < '2019-01-01' while TiDB will not. Notice that this is only for the "first transition value" and TiDB will prune correctly for < '2019-01-02 (which includes the p2019 partition).

mjonss commented 2 years ago

/component tablepartition