pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.8k stars 5.8k forks source link

support KV Scan with ranges if possible to speedup queries #14907

Open zz-jason opened 4 years ago

zz-jason commented 4 years ago

Feature Request

Is your feature request related to a problem? Please describe:

consider this case:

create table t(a bigint primary key, b bigint);

The following query can use PointGet to speed up:

TiDB(root@127.0.0.1:test) > desc select * from t where a = 1;
+-------------+-------+------+-------------------+
| id          | count | task | operator info     |
+-------------+-------+------+-------------------+
| Point_Get_1 | 1.00  | root | table:t, handle:1 |
+-------------+-------+------+-------------------+
1 row in set (0.00 sec)

while the following query can not:

TiDB(root@127.0.0.1:test) > desc select * from t where a >= 1 and a <= 100;
+-------------------+--------+-----------+--------------------------------------------------------+
| id                | count  | task      | operator info                                          |
+-------------------+--------+-----------+--------------------------------------------------------+
| TableReader_6     | 123.75 | root      | data:TableScan_5                                       |
| └─TableScan_5     | 123.75 | cop[tikv] | table:t, range:[1,100], keep order:false, stats:pseudo |
+-------------------+--------+-----------+--------------------------------------------------------+
2 rows in set (0.00 sec)

Describe the feature you'd like:

Directly calling KV Scan with ranges may help to speed the above query, the execution plan may look like:

TiDB(root@127.0.0.1:test) > desc select * from t where a = 1;
+---------------+---------+------+------------------------+
| id            | count   | task | operator info          |
+---------------+---------+------+------------------------+
| KVRangeScan_1 | 123.75  | root | table:t, range:[1,100] |
+---------------+---------+------+------------------------+
1 row in set (0.00 sec)

Describe alternatives you've considered:

For discrete values like int, we can convert range queries to in(). take the above query as an example, the equivalent query is:

select * from t where a in(1, 2, ..., 100);

Then this query can benefit from BatchPoint in this case.

Teachability, Documentation, Adoption, Migration Strategy:

No

zz-jason commented 4 years ago

@eurekaka PTAL

siddontang commented 4 years ago

have we compared the performance of KV scan and Cop Table scan?

maybe we can use KV scan directly?

zz-jason commented 4 years ago

@siddontang KVRangeScan in the example is actually implemented by calling KV scan directly. It's just clarification to users by naming it to KVRangeScan, since we are going to rename TableScan to TableFullScan, TableRangeScan and TableScanByRowID.

eurekaka commented 4 years ago

We can reuse the BatchPointGet plan / executor node, and the logic of converting small range to point sets introduced in https://github.com/pingcap/tidb/pull/11524 ?

eurekaka commented 4 years ago

Oh, KVRangeScan is supposed to have better performance than BatchPointGet.