The cost of merge sort between different partitions is ignored

Defined2014 commented 6 months ago

Enhancement

set tidb_enable_global_index = true;

CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  UNIQUE KEY `idx1` (`b`),
  KEY `idx` (`b`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY HASH (`a`) PARTITIONS 5;

mysql> explain format='verbose' select * from t use index(idx) where b > 0 order by b limit 10;
+----------------------------------+---------+----------+-----------+-----------------------+-----------------------------------------------+
| id                               | estRows | estCost  | task      | access object         | operator info                                 |
+----------------------------------+---------+----------+-----------+-----------------------+-----------------------------------------------+
| Projection_21                    | 10.00   | 11736.63 | root      |                       | test.t.a, test.t.b                            |
| └─IndexLookUp_20                 | 10.00   | 11734.64 | root      | partition:all         | limit embedded(offset:0, count:10)            |
|   ├─Limit_19(Build)              | 10.00   | 2273.08  | cop[tikv] |                       | offset:0, count:10                            |
|   │ └─IndexRangeScan_17          | 10.00   | 2273.08  | cop[tikv] | table:t, index:idx(b) | range:(0,+inf], keep order:true, stats:pseudo |
|   └─TableRowIDScan_18(Probe)     | 10.00   | 1866.08  | cop[tikv] | table:t               | keep order:false, stats:pseudo                |
+----------------------------------+---------+----------+-----------+-----------------------+-----------------------------------------------+
5 rows in set (0.00 sec)

mysql> explain format='verbose' select * from t use index(idx1) where b > 0 order by b limit 10;
+----------------------------------+---------+----------+-----------+------------------------+-----------------------------------------------+
| id                               | estRows | estCost  | task      | access object          | operator info                                 |
+----------------------------------+---------+----------+-----------+------------------------+-----------------------------------------------+
| Projection_21                    | 10.00   | 11736.63 | root      |                        | test.t.a, test.t.b                            |
| └─IndexLookUp_20                 | 10.00   | 11734.64 | root      | partition:all          | limit embedded(offset:0, count:10)            |
|   ├─Limit_19(Build)              | 10.00   | 2273.08  | cop[tikv] |                        | offset:0, count:10                            |
|   │ └─IndexRangeScan_17          | 10.00   | 2273.08  | cop[tikv] | table:t, index:idx1(b) | range:(0,+inf], keep order:true, stats:pseudo |
|   └─TableRowIDScan_18(Probe)     | 10.00   | 1866.08  | cop[tikv] | table:t                | keep order:false, stats:pseudo                |
+----------------------------------+---------+----------+-----------+------------------------+-----------------------------------------------+
5 rows in set (0.00 sec)

From the above results, we can see that the cost of using idx and idx1(No merge sort, because it's a global index) is the same because we ignore the cost of merge sort between different partitions in the indexLookUp executor.

Defined2014 commented 4 months ago

Also for IndexReader, global index has two more rows(_tidb_rowid, _tidb_pid) in IndexScan. If we don't consider mergeSort, the cost of using global index will greatter than normall index.

mysql> explain format='verbose' select b from t use index(idx1) where b > 0 order by b limit 3;
+-----------------------------+---------+---------+-----------+------------------------+---------------------------------+
| id                          | estRows | estCost | task      | access object          | operator info                   |
+-----------------------------+---------+---------+-----------+------------------------+---------------------------------+
| Limit_11                    | 3.00    | 51.80   | root      |                        | offset:0, count:3               |
| └─IndexReader_20            | 3.00    | 51.80   | root      | partition:all          | index:Limit_19                  |
|   └─Limit_19                | 3.00    | 681.92  | cop[tikv] |                        | offset:0, count:3               |
|     └─IndexRangeScan_17     | 3.00    | 681.92  | cop[tikv] | table:t, index:idx1(b) | range:(0,+inf], keep order:true |
+-----------------------------+---------+---------+-----------+------------------------+---------------------------------+
4 rows in set (10.61 sec)

mysql> explain format='verbose' select b from t use index(idx) where b > 0 order by b limit 3;
+-----------------------------+---------+---------+-----------+-----------------------+---------------------------------+
| id                          | estRows | estCost | task      | access object         | operator info                   |
+-----------------------------+---------+---------+-----------+-----------------------+---------------------------------+
| Limit_11                    | 3.00    | 38.90   | root      |                       | offset:0, count:3               |
| └─IndexReader_18            | 3.00    | 38.90   | root      | partition:all         | index:Limit_17                  |
|   └─Limit_17                | 3.00    | 488.40  | cop[tikv] |                       | offset:0, count:3               |
|     └─IndexRangeScan_16     | 3.00    | 488.40  | cop[tikv] | table:t, index:idx(b) | range:(0,+inf], keep order:true |
+-----------------------------+---------+---------+-----------+-----------------------+---------------------------------+
4 rows in set (7.83 sec)

Defined2014 commented 4 months ago

After discussing with @qw4990 and @AilinKid , I found that this issue is not easy to fix. The reason is that the indexReader has not yet been generated when compares each indexes finished, so the merge sort overhead in the indexReader will not be calculated. There are some ways to do this.

Adding merge sort overhead to indexReader is a bit dirty. And the cost will not be very accurate, the concurrency will influence it a lot.
Like https://github.com/pingcap/tidb/issues/40748, add a Merge operation. This is also difficult. Now the optimizer does not allow a root operator to be a child of DataSource, which will destroy some assumptions.
Add a new PhysicalProperty to deal global index. Also very complicated.

Above all, just keep it right now. Maybe we could do it with cascades.

Defined2014 commented 2 weeks ago

If we add MergeCost in indexReader, the global index path also not be selected. Because in findBestTask4DS function, it will compare indexScan cost between different pathes, which means the cost of global index is always greater than normal index, so it will never be selected.

pingcap / tidb

The cost of merge sort between different partitions is ignored #52061

Enhancement