`Limit` operator can not be pushed down to tikv when using `IN` clause

Yriuns commented 2 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

create a table

CREATE TABLE `t` (
  `id` bigint(22) NOT NULL AUTO_INCREMENT,
  `a` bigint(20) NOT NULL DEFAULT '0',
  `b` bigint(20) NOT NULL DEFAULT '0',
  `c` bigint(20) NOT NULL DEFAULT '0',
  `d` bigint(20) NOT NULL DEFAULT '0',
  UNIQUE KEY `uk_id` (`id`),
  KEY `a_b_c_id` (`a`,`b`,`c`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=1;

perform a query

explain
SELECT id, a, b, c, d
FROM t
WHERE a = 0 AND b = 1 AND c IN (2, 3)
ORDER BY id ASC
LIMIT 10;

2. What did you expect to see? (Required)

The field in WHERE and ORDER BY clause all hit the index a_b_c_id, so the Limit operator should be pushed down to tikv's each range, then tidb perform a final TopN, something like

mysql> explain SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c IN (2, 3) ORDER BY id ASC LIMIT 10;
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
| id                               | estRows | task      | access object                        | operator info                                                      |
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
| TopN_9                           | 0.00    | root      |                                      | test.t.id, offset:0, count:10                                      |
| └─IndexLookUp_20                 | 0.00    | root      |                                      |                                                                    |
|   ├─Limit_19(Build)              | 0.00    | cop[tikv] |                                      | test.t.id, offset:0, count:10                                      |
|   │ └─IndexRangeScan_17          | 0.00    | cop[tikv] | table:t, index:a_b_c_id(a, b, c, id) | range:[0 1 2,0 1 2], [0 1 3,0 1 3], keep order:true, stats:pseudo |
|   └─TableRowIDScan_18(Probe)     | 0.00    | cop[tikv] | table:t                              | keep order:false, stats:pseudo                                     |
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
5 rows in set (0.00 sec)

3. What did you see instead (Required)

The execution plan choose push down TopN rather than Limit, which results in an unnecessary Scan + Sort.

mysql> explain SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c IN (2, 3) ORDER BY id ASC LIMIT 10;
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
| id                               | estRows | task      | access object                        | operator info                                                      |
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
| TopN_9                           | 0.00    | root      |                                      | test.t.id, offset:0, count:10                                      |
| └─IndexLookUp_20                 | 0.00    | root      |                                      |                                                                    |
|   ├─TopN_19(Build)               | 0.00    | cop[tikv] |                                      | test.t.id, offset:0, count:10                                      |
|   │ └─IndexRangeScan_17          | 0.00    | cop[tikv] | table:t, index:a_b_c_id(a, b, c, id) | range:[0 1 2,0 1 2], [0 1 3,0 1 3], keep order:false, stats:pseudo |
|   └─TableRowIDScan_18(Probe)     | 0.00    | cop[tikv] | table:t                              | keep order:false, stats:pseudo                                     |
+----------------------------------+---------+-----------+--------------------------------------+--------------------------------------------------------------------+
5 rows in set (0.00 sec)

4. What is your TiDB version? (Required)

Release Version: v6.0.0
Edition: Community
Git Commit Hash: 36a9810441ca0e496cbd22064af274b3be771081
Git Branch: heads/refs/tags/v6.0.0
UTC Build Time: 2022-03-31 10:27:25
GoVersion: go1.18
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false

winoros commented 2 years ago

Hi, this is not a bug. The index is (a, b, c, d). And there's IN condition on c, equal condition on a and b. You want to return data with the order of d.

Since there's multiple values the column c returned, if you don't add top-n, the data get is with the order of (c, d), not d.

winoros commented 2 years ago

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Yriuns commented 2 years ago

Hi, this is not a bug. The index is (a, b, c, d). And there's IN condition on c, equal condition on a and b. You want to return data with the order of d.

Since there's multiple values the column c returned, if you don't add top-n, the data get is with the order of (c, d), not d.

I'm afraid you misunderstood my question.

The optimal execution plan is:

tidb ask tikv to scan the first 10 records of (a=0, b=1, c=2), with their id fields. (Limit@cop)
tidb ask tikv to scan the first 10 records of (a=0, b=1, c=3), with their id fields. (Limit@cop)
tidb perform a Top-10 on these 20 records, order by id. (TopN@root)

There is no need to let tikv perform Top-10, right?

Yriuns commented 2 years ago

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently.

I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

Yriuns commented 2 years ago

@winoros Or is it possible for planner to generate such execution plan (by itself)?

SELECT id, a, b, c, d
FROM (
    (SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c = 2 ORDER BY id ASC LIMIT 10)
    UNION
    (SELECT id, a, b, c, d FROM t WHERE a = 0 AND b = 1 AND c = 3 ORDER BY id ASC LIMIT 10)
) r
ORDER BY id ASC
LIMIT 10;

winoros commented 2 years ago

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently.

I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

We cannot guarantee such behavior. Since that would increase the number of requests we send, increasing the pressure of the TiKV side.

Yriuns commented 2 years ago

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently. I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

We cannot guarantee such behavior. Since that would increase the number of requests we send, increasing the pressure of the TiKV side.

I know that, I mean the cost-based optimizer should be able to find the minimum one between the cost of more RPC and the cost of more Scan.

In our senario, the number of records that match a = 0 AND b = 1 AND c IN (2, 3) may reach several millions, which means a range scan of several millions in tikv. However, if the planner can rewrite the SQL to the UNION format, it only needs to scan 20 records.

Yriuns commented 2 years ago

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently. I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

We cannot guarantee such behavior. Since that would increase the number of requests we send, increasing the pressure of the TiKV side.

Hi there, I make a pull request to enable optimizer to generate an execution plan with Limit push down. Will you take a look? If you agree with the idea of this PR, I will continue to improve it. Otherwise, I will just closed it 😢

winoros commented 2 years ago

And when the request is pushed down to tikv. The base request unit is region, not range. If the two ranges are in the same region. There's only one request. So we cannot just push limit down to the TiKV

Oh... It seems that this is the root cause. Will it be supported in the future? That is, the executors can handle multiple ranges independently. I think such query is quite common. If the Limit can be push down, we can save a lot of unnecessary scan.

We cannot guarantee such behavior. Since that would increase the number of requests we send, increasing the pressure of the TiKV side.

Hi there, I make a pull request to enable optimizer to generate an execution plan with Limit push down. Will you take a look? If you agree with the idea of this PR, I will continue to improve it. Otherwise, I will just closed it 😢

Let me contact our PM first.

winoros commented 2 years ago

Oh, I just ask, does the RDBMS you currently use support this behavior?

Yriuns commented 2 years ago

Oh, I just ask, does the RDBMS you currently use support this behavior?

I provide an optimizer hint to enable this, so I think your concern about "behavior" is uncessary. This is just an identity transformation of relational algebra.

winoros commented 2 years ago

Oh, I just ask, does the RDBMS you currently use support this behavior?

I provide an optimizer hint to enable this, so I think your concern about "behavior" is uncessary. This is just an identity transformation of relational algebra.

@Yriuns Yes, it's not elegant enough to put into TiDB as a release behavior. A switch usually means that if we don't enable it, most of the users won't use it.

Yriuns commented 2 years ago

Oh, I just ask, does the RDBMS you currently use support this behavior?

I provide an optimizer hint to enable this, so I think your concern about "behavior" is uncessary. This is just an identity transformation of relational algebra.

@Yriuns Yes, it's not elegant enough to put into TiDB as a release behavior. A switch usually means that if we don't enable it, most of the users won't use it.

Agreed, this feature will increase the burden of physical plan optimization. It should be used only if the gain exceeds the CPU overhead. Under current optimizer framework, it seems impossible for the optimizer itself to decide whether to use or not according to the cost.

But some is better than none, right? : ) And what's the official attitude about optimizer hint? I just notice that v6.1 add a LEADING hint.

pingcap / tidb