pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.8k stars 5.8k forks source link

enhance the filter propagation on `is null` #7973

Open zz-jason opened 5 years ago

zz-jason commented 5 years ago

Feature Request

Is your feature request related to a problem? Please describe:

Relates to https://github.com/pingcap/tidb/issues/7834

Describe the feature you'd like:

Let's say we have the following table:

drop table if exists t;
create table t(a bigint, b bigint);
insert into t values(1, 1), (null, null);

in the filters of a query, if there is a filter a is null, and column a is involved in another null rejective filter, e.g. a=b, a in (1, null), a > 10, then we can derive a false filter and further change the filters of the query to false, because there is no record can satisfy both the filters at the same time.

Examples in MySQL:

MySQL(localhost:3306) > select * from t where a is null and a=b;
Empty set (0.00 sec)

MySQL(localhost:3306) > select * from t where a is null and a in (1, null);
Empty set (0.00 sec)

MySQL(localhost:3306) > select * from t where a is null and a>10;
Empty set (0.00 sec)

Execution plans in TiDB:

TiDB(localhost:4000) > desc select * from t where a is null and a=b;
+---------------------+----------+------+------------------------------------------------------------+
| id                  | count    | task | operator info                                              |
+---------------------+----------+------+------------------------------------------------------------+
| TableReader_7       | 8.00     | root | data:Selection_6                                           |
| └─Selection_6       | 8.00     | cop  | eq(test.t.a, test.t.b), isnull(test.t.a)                   |
|   └─TableScan_5     | 10000.00 | cop  | table:t, range:[-inf,+inf], keep order:false, stats:pseudo |
+---------------------+----------+------+------------------------------------------------------------+
3 rows in set (0.00 sec)

TiDB(localhost:4000) > desc select * from t where a is null and a in (1, null);
+---------------------+----------+------+------------------------------------------------------------+
| id                  | count    | task | operator info                                              |
+---------------------+----------+------+------------------------------------------------------------+
| TableReader_7       | 0.00     | root | data:Selection_6                                           |
| └─Selection_6       | 0.00     | cop  | in(test.t.a, 1, NULL), isnull(test.t.a)                    |
|   └─TableScan_5     | 10000.00 | cop  | table:t, range:[-inf,+inf], keep order:false, stats:pseudo |
+---------------------+----------+------+------------------------------------------------------------+
3 rows in set (0.00 sec)

TiDB(localhost:4000) > desc select * from t where a is null and a>10;
+---------------------+----------+------+------------------------------------------------------------+
| id                  | count    | task | operator info                                              |
+---------------------+----------+------+------------------------------------------------------------+
| TableReader_7       | 0.00     | root | data:Selection_6                                           |
| └─Selection_6       | 0.00     | cop  | gt(test.t.a, 10), isnull(test.t.a)                         |
|   └─TableScan_5     | 10000.00 | cop  | table:t, range:[-inf,+inf], keep order:false, stats:pseudo |
+---------------------+----------+------+------------------------------------------------------------+
3 rows in set (0.00 sec)

TiDB(localhost:4000) > desc select * from t where a is null and a=10;
+---------------------+----------+------+------------------------------------------------------------+
| id                  | count    | task | operator info                                              |
+---------------------+----------+------+------------------------------------------------------------+
| TableReader_7       | 0.00     | root | data:Selection_6                                           |
| └─Selection_6       | 0.00     | cop  | eq(test.t.a, 10), isnull(test.t.a)                         |
|   └─TableScan_5     | 10000.00 | cop  | table:t, range:[-inf,+inf], keep order:false, stats:pseudo |
+---------------------+----------+------+------------------------------------------------------------+
3 rows in set (0.00 sec)

TiDB(localhost:4000) > select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v2.1.0-rc.3-79-gedaec7bda
Git Commit Hash: edaec7bdaf7cdc7e1ba822e393112c2d592308ad
Git Branch: master
UTC Build Time: 2018-10-21 05:57:46
GoVersion: go version go1.11 darwin/amd64
Race Enabled: false
TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
Check Table Before Drop: false
1 row in set (0.00 sec)

Describe alternatives you've considered: No

Teachability, Documentation, Adoption, Migration Strategy: No

ghost commented 3 years ago

Verifying this against master (and confirming MySQL behavior - which would show impossible WHERE if it could optimize this. It looks like it does not either):

drop table if exists t;
create table t(a bigint, b bigint);
insert into t values(1, 1), (null, null);
explain select * from t where a is null and a=b;
explain select * from t where a is null and a in (1, null);
explain select * from t where a is null and a>10;

TiDB:

mysql> explain select * from t where a is null and a=b;
+-------------------------+----------+-----------+---------------+------------------------------------------+
| id                      | estRows  | task      | access object | operator info                            |
+-------------------------+----------+-----------+---------------+------------------------------------------+
| TableReader_7           | 8.00     | root      |               | data:Selection_6                         |
| └─Selection_6           | 8.00     | cop[tikv] |               | eq(test.t.a, test.t.b), isnull(test.t.a) |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo           |
+-------------------------+----------+-----------+---------------+------------------------------------------+
3 rows in set (0.00 sec)

mysql> explain select * from t where a is null and a in (1, null);
+-------------------------+----------+-----------+---------------+-----------------------------------------+
| id                      | estRows  | task      | access object | operator info                           |
+-------------------------+----------+-----------+---------------+-----------------------------------------+
| TableReader_7           | 0.00     | root      |               | data:Selection_6                        |
| └─Selection_6           | 0.00     | cop[tikv] |               | in(test.t.a, 1, NULL), isnull(test.t.a) |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo          |
+-------------------------+----------+-----------+---------------+-----------------------------------------+
3 rows in set (0.00 sec)

mysql> explain select * from t where a is null and a>10;
+-------------------------+----------+-----------+---------------+------------------------------------+
| id                      | estRows  | task      | access object | operator info                      |
+-------------------------+----------+-----------+---------------+------------------------------------+
| TableReader_7           | 0.00     | root      |               | data:Selection_6                   |
| └─Selection_6           | 0.00     | cop[tikv] |               | gt(test.t.a, 10), isnull(test.t.a) |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t       | keep order:false, stats:pseudo     |
+-------------------------+----------+-----------+---------------+------------------------------------+
3 rows in set (0.00 sec)

mysql> select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v4.0.0-beta.2-1054-g945f7cc4a
Edition: Community
Git Commit Hash: 945f7cc4a04dd09367674b4e5d76f1602812da15
Git Branch: master
UTC Build Time: 2020-08-26 13:33:24
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec)

MySQL:

mysql [localhost:8021] {msandbox} (test) > explain select * from t where a is null and a=b;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |    50.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8021] {msandbox} (test) > explain select * from t where a is null and a in (1, null);
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |    50.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8021] {msandbox} (test) > explain select * from t where a is null and a>10;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |    50.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

See https://github.com/pingcap/tidb/issues/17786 , https://github.com/pingcap/tidb/issues/19067 and https://github.com/pingcap/tidb/issues/10497 for other cases where an impossible where is not discovered (but could be!)