pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.8k stars 5.8k forks source link

memory consumption improvement for handling partitioned table #53177

Closed mzhang77 closed 1 week ago

mzhang77 commented 3 months ago

Feature Request

Is your feature request related to a problem? Please describe: Memory consumption increase with number of partitions in table:

# 16 partitions --> 5GB memory 
MySQL> explain analyze select * from tb1 order by pk1,pk2,pk3 asc limit 125268248, 8947732;
+----------------------------+--------------+-----------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+---------+------+
| id                         | estRows      | actRows   | task      | access object            | execution info                                                                                                                                                                                                                                                                                                                                                            | operator info                   | memory  | disk |
+----------------------------+--------------+-----------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+---------+------+
| Limit_17                   | 134215980.00 | 8947732   | root      |                          | time:1m51.7s, loops:8740, RU:770310.452994                                                                                                                                                                                                                                                                                                                                | offset:125268248, count:8947732 | N/A     | N/A  |
| └─TableReader_16           | 134215980.00 | 134215980 | root      | partition:all            | time:1m51.6s, loops:131071, cop_task: {num: 6552, max: 259.7ms, min: 885.5µs, avg: 53.7ms, p95: 139.8ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 4m10.1s, tot_wait: 991ms, rpc_num: 6555, rpc_time: 5m51.4s, copr_cache_hit_ratio: 0.00, build_task_duration: 1.25ms, max_distsql_concurrency: 2}, backoff{regionMiss: 6ms}                                 | data:Limit_15                   | 5.00 GB | N/A  |
|   └─Limit_15               | 134215980.00 | 134449519 | cop[tikv] |                          | tikv_task:{proc max:136ms, min:0s, avg: 29.2ms, p80:64ms, p95:76ms, iters:157102, tasks:6552}, scan_detail: {total_process_keys: 134449519, total_process_keys_size: 39759319473, total_keys: 134456071, get_snapshot_time: 538.2ms, rocksdb: {key_skipped_count: 158332965, block: {cache_hit_count: 25882, read_count: 1285158, read_byte: 8.16 GB, read_time: 3.43s}}} | offset:0, count:134215980       | N/A     | N/A  |
|     └─TableFullScan_14     | 134215980.00 | 134449519 | cop[tikv] | table:tb1                | tikv_task:{proc max:136ms, min:0s, avg: 29.1ms, p80:64ms, p95:76ms, iters:157102, tasks:6552}                                                                                                                                                                                                                                                                             | keep order:true                 | N/A     | N/A  |
+----------------------------+--------------+-----------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+---------+------+
4 rows in set (1 min 51.76 sec)
# 32 partitions --> 10GB memory
MySQL [ep_foundation_core_entities]> explain analyze select * from tb1 order by pk1,pk2,pk3 asc limit 177730960, 8886548;
+----------------------------+--------------+-----------+-----------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+---------+------+
| id                         | estRows      | actRows   | task      | access object                 | execution info                                                                                                                                                                                                                                                                                                                                                            | operator info                   | memory  | disk |
+----------------------------+--------------+-----------+-----------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+---------+------+
| Limit_17                   | 186617508.00 | 8886538   | root      |                               | time:2m39s, loops:8680, RU:983811.667225                                                                                                                                                                                                                                                                                                                                  | offset:177730960, count:8886548 | N/A     | N/A  |
| └─TableReader_16           | 186617508.00 | 186617498 | root      | partition:all                 | time:2m39s, loops:182245, cop_task: {num: 8947, max: 689.9ms, min: 871µs, avg: 68.9ms, p95: 234.8ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 5m49.8s, tot_wait: 1.55s, rpc_num: 8947, rpc_time: 10m16.4s, copr_cache_hit_ratio: 0.00, build_task_duration: 1.23ms, max_distsql_concurrency: 2}                                                              | data:Limit_15                   | 10.4 GB | N/A  |
|   └─Limit_15               | 186617508.00 | 186617498 | cop[tikv] |                               | tikv_task:{proc max:176ms, min:0s, avg: 29.7ms, p80:64ms, p95:80ms, iters:217469, tasks:8947}, scan_detail: {total_process_keys: 186617498, total_process_keys_size: 56687627028, total_keys: 186626445, get_snapshot_time: 917.4ms, rocksdb: {key_skipped_count: 226889259, block: {cache_hit_count: 39229, read_count: 1844413, read_byte: 9.31 GB, read_time: 4.53s}}} | offset:0, count:186617508       | N/A     | N/A  |
|     └─TableFullScan_14     | 186617508.00 | 186617498 | cop[tikv] | table:tb1                     | tikv_task:{proc max:176ms, min:0s, avg: 29.7ms, p80:64ms, p95:80ms, iters:217469, tasks:8947}                                                                                                                                                                                                                                                                             | keep order:true                 | N/A     | N/A  |
+----------------------------+--------------+-----------+-----------+-------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+---------+------+
4 rows in set (2 min 39.06 sec)
# 64 partitions --> 20GB memory
MySQL [ep_foundation_core_entities]> explain analyze select * from tb1 order by pk1,pk2,pk3 asc limit 125268248, 8947732;
+----------------------------+--------------+-----------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+---------+------+
| id                         | estRows      | actRows   | task      | access object            | execution info                                                                                                                                                                                                                                                                                                                                                            | operator info                   | memory  | disk |
+----------------------------+--------------+-----------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+---------+------+
| Limit_17                   | 134215980.00 | 8947732   | root      |                          | time:2m19.6s, loops:8740, RU:974951.967167                                                                                                                                                                                                                                                                                                                                | offset:125268248, count:8947732 | N/A     | N/A  |
| └─TableReader_16           | 134215980.00 | 134215980 | root      | partition:all            | time:2m19.5s, loops:131071, cop_task: {num: 6272, max: 2.99s, min: 649µs, avg: 201.9ms, p95: 644ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 3m43.5s, tot_wait: 847.3ms, rpc_num: 6273, rpc_time: 21m6.3s, copr_cache_hit_ratio: 0.02, build_task_duration: 1.43ms, max_distsql_concurrency: 2}, backoff{regionMiss: 2ms}                                    | data:Limit_15                   | 21.6 GB | N/A  |
|   └─Limit_15               | 134215980.00 | 135363725 | cop[tikv] |                          | tikv_task:{proc max:116ms, min:0s, avg: 25.2ms, p80:56ms, p95:68ms, iters:156854, tasks:6272}, scan_detail: {total_process_keys: 134438250, total_process_keys_size: 39771543878, total_keys: 134444406, get_snapshot_time: 486.8ms, rocksdb: {key_skipped_count: 158470825, block: {cache_hit_count: 1309865, read_count: 2880, read_byte: 13.4 MB, read_time: 19.4ms}}} | offset:0, count:134215980       | N/A     | N/A  |
|     └─TableFullScan_14     | 134215980.00 | 135363725 | cop[tikv] | table:tb1                | tikv_task:{proc max:116ms, min:0s, avg: 25.2ms, p80:56ms, p95:68ms, iters:156854, tasks:6272}                                                                                                                                                                                                                                                                             | keep order:true                 | N/A     | N/A  |
+----------------------------+--------------+-----------+-----------+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+---------+------+
4 rows in set (2 min 19.62 sec)

# table defintion
CREATE TABLE `tb1` (
  `pk1` bigint(20) NOT NULL,
  `pk2` bigint(20) NOT NULL,
  `pk3` varchar(101) NOT NULL,
  `column1` bigint(20) NOT NULL,
  `column2` varchar(100) DEFAULT 'string',
  `column3` bigint(20) DEFAULT NULL,
  `column4` bigint(20) NOT NULL,
  `column5` bigint(20) NOT NULL,
  `column6` bigint(20) NOT NULL,
  `column7` varchar(100) NOT NULL,
  PRIMARY KEY (`pk1`,`pk2`,`pk3`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY HASH (`pk1`) PARTITIONS 16

Describe the feature you'd like: tidb fetches limit+offset number of rows from every partition into memory, then do sort merge, that's why memory consumption is proportional to number of partitions.

tidb could do sort merge on the fly while fetching data from all partitions, so that total memory consumption will not be proportional to number of partitions.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy: The above SQLs are from PingCAP paid customer.

hawkingrei commented 3 months ago

What version of TiDB are you running?

mzhang77 commented 1 week ago

I believe there is no optimization possible here.