pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.03k stars 5.82k forks source link

TTL (Time to Live): As more and more MVCC versions are deleted in RocksDB, the scanning speed slows down. #52643

Closed 15767637775 closed 5 months ago

15767637775 commented 5 months ago

Enhancement

version: 7.1.4 & 8.0.0 TTL SQL with the 'LIMIT' clause, as more and more MVCC versions are deleted in RocksDB, the scan speed decreases. TTL scan SQL:

SELECT
  LOW_PRIORITY `id`
FROM
  `test`.`sbtest1`
WHERE
  `id` > 6686771
  AND `id` < 6872610
  AND `created_at` < FROM_UNIXTIME(1712138091)
ORDER BY
  `id` ASC
LIMIT
  500;

By default, all regions of this table (row data) are split into 64 parts as uniformly as possible. Assuming the customer needs to archive a table of 6400 million rows every day, then each part still has 100 million rows. Then I tried testing the scanning performance of each part, this test should yield similar effect.

sysbench \
  --config-file=sysbench oltp_read_write \
  --tables=1 \
  --table-size=100000000 \
  prepare
set global tidb_gc_life_time = '240h';

-- just begin
explain analyze 
SELECT LOW_PRIORITY `id` 
FROM `test`.`sbtest1` 
ORDER BY `id` ASC 
LIMIT 10;
+----------------------------+---------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+-----------+------+
| id                         | estRows | actRows | task      | access object | execution info                                                                                                                                                                                                                        | operator info      | memory    | disk |
+----------------------------+---------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+-----------+------+
| Limit_10                   | 10.00   | 10      | root      |               | time:841.9µs, loops:2, RU:0.536366                                                                                                                                                                                                    | offset:0, count:10 | N/A       | N/A  |
| └─TableReader_18           | 10.00   | 10      | root      |               | time:837.5µs, loops:1, cop_task: {num: 1, max: 739.4µs, proc_keys: 10, tot_proc: 96.7µs, tot_wait: 56.8µs, rpc_num: 1, rpc_time: 706.2µs, copr_cache_hit_ratio: 0.00, build_task_duration: 353.4µs, max_distsql_concurrency: 1}       | data:Limit_17      | 330 Bytes | N/A  |
|   └─Limit_17               | 10.00   | 10      | cop[tikv] |               | tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 10, total_process_keys_size: 270, total_keys: 11, get_snapshot_time: 26.5µs, rocksdb: {key_skipped_count: 10, block: {cache_hit_count: 6}}}                           | offset:0, count:10 | N/A       | N/A  |
|     └─TableFullScan_16     | 10.00   | 10      | cop[tikv] | table:sbtest1 | tikv_task:{time:0s, loops:1}                                                                                                                                                                                                          | keep order:true    | N/A       | N/A  |
+----------------------------+---------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+-----------+------+
4 rows in set (0 sec)

--  after deleting approximately 100 million rows.
explain analyze 
SELECT LOW_PRIORITY `id` 
FROM `test`.`sbtest1` 
ORDER BY `id` ASC 
LIMIT 10;
+----------------------------+---------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+-----------+------+
| id                         | estRows | actRows | task      | access object | execution info                                                                                                                                                                                                                                                                                    | operator info                 | memory    | disk |
+----------------------------+---------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+-----------+------+
| Limit_10                   | 10.00   | 0       | root      |               | time:1m15.1s, loops:1, RU:25023.617409                                                                                                                                                                                                                                                            | offset:0, count:10            | N/A       | N/A  |
| └─TableReader_18           | 10.00   | 0       | root      |               | time:1m15.1s, loops:1, cop_task: {num: 245, max: 390.5ms, min: 16ms, avg: 306.6ms, p95: 342ms, tot_proc: 1m14.9s, tot_wait: 61.1ms, rpc_num: 245, rpc_time: 1m15.1s, copr_cache_hit_ratio: 0.00, build_task_duration: 31.4ms, max_distsql_concurrency: 1}                                         | data:Limit_17                 | 333 Bytes | N/A  |
|   └─Limit_17               | 10.00   | 0       | cop[tikv] |               | tikv_task:{proc max:389ms, min:14ms, avg: 305.7ms, p80:326ms, p95:341ms, iters:245, tasks:245}, scan_detail: {total_keys: 200030245, get_snapshot_time: 43.1ms, rocksdb: {key_skipped_count: 200030000, block: {cache_hit_count: 670, read_count: 826355, read_byte: 10.1 GB, read_time: 2.57s}}} | offset:0, count:10            | N/A       | N/A  |
|     └─TableFullScan_16     | 10.00   | 0       | cop[tikv] | table:sbtest1 | tikv_task:{proc max:389ms, min:14ms, avg: 305.6ms, p80:326ms, p95:341ms, iters:245, tasks:245}                                                                                                                                                                                                    | keep order:true, stats:pseudo | N/A       | N/A  |
+----------------------------+---------+---------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+-----------+------+
4 rows in set (1 min 15.18 sec)

More efficient way:

-- Each part should be further subdivided.
-- There are upper and lower bounds, which can reduce the scanning of historical versions in RocksDB.
-- Reference link: https://docs.pingcap.com/zh/tidb/stable/dev-guide-paginate-results#%E5%8D%95%E5%AD%97%E6%AE%B5%E4%B8%BB%E9%94%AE%E8%A1%A8%E7%9A%84%E5%88%86%E9%A1%B5%E6%89%B9%E5%A4%84%E7%90%86
SELECT
    MIN(t.id) AS start_key,
    MAX(t.id) AS end_key,
    count(1)
FROM
    (
        SELECT
            id,
            row_number() over (
                ORDER BY id
            ) AS row_num
        FROM
            sbtest1
    ) t
GROUP BY
    FLOOR((t.row_num - 1) / 500 )
ORDER BY start_key;
+-----------+---------+----------+
| start_key | end_key | count(1) |
+-----------+---------+----------+
|        1  |    500  |      500 |
|      501  |   1000  |      500 |
|     1001  |   1500  |      500 |
|     1501  |   2000  |      500 |
|     2001  |   2500  |      500 |
|     2501  |   3000  |      500 |
|     3001  |   3500  |      500 |
|     3501  |   4000  |      500 |
|     4001  |   4500  |      500 |
... ...

delete from sbtest1 
    where id >= $star_tkey and rowid <= $end_key ... ...

-- or 

select id from sbtest1 
    where id >= $star_tkey and id <= $end_key ... ...

DELETE LOW_PRIORITY
FROM
  `test`.`sbtest1`
WHERE
  `id` IN (
    ...
  )
LIMIT
  100;
15767637775 commented 5 months ago

The next scan carries over the lower boundary from the previous one, so the number of scanned rows is similar to the "More efficient way".

SELECT LOW_PRIORITY `id` FROM `test`.`sbtest2` WHERE `id` >= 5426426 AND `id` < 6251583 AND `created_at` < FROM_UNIXTIME(1713338305) ORDER BY `id` ASC LIMIT 500;
SELECT LOW_PRIORITY `id` FROM `test`.`sbtest2` WHERE `id` >= 4602563 AND `id` < 5426426 AND `created_at` < FROM_UNIXTIME(1713338305) ORDER BY `id` ASC LIMIT 500;
SELECT LOW_PRIORITY `id` FROM `test`.`sbtest2` WHERE `id` > 4603062 AND `id` < 5426426 AND `created_at` < FROM_UNIXTIME(1713338305) ORDER BY `id` ASC LIMIT 500;
SELECT LOW_PRIORITY `id` FROM `test`.`sbtest2` WHERE `id` > 4603562 AND `id` < 5426426 AND `created_at` < FROM_UNIXTIME(1713338305) ORDER BY `id` ASC LIMIT 500;
SELECT LOW_PRIORITY `id` FROM `test`.`sbtest2` WHERE `id` > 4604062 AND `id` < 5426426 AND `created_at` < FROM_UNIXTIME(1713338305) ORDER BY `id` ASC LIMIT 500;
SELECT LOW_PRIORITY `id` FROM `test`.`sbtest2` WHERE `id` > 4604562 AND `id` < 5426426 AND `created_at` < FROM_UNIXTIME(1713338305) ORDER BY `id` ASC LIMIT 500;
SELECT LOW_PRIORITY `id` FROM `test`.`sbtest2` WHERE `id` > 4605062 AND `id` < 5426426 AND `created_at` < FROM_UNIXTIME(1713338305) ORDER BY `id` ASC LIMIT 500;
SELECT LOW_PRIORITY `id` FROM `test`.`sbtest2` WHERE `id` > 4605562 AND `id` < 5426426 AND `created_at` < FROM_UNIXTIME(1713338305) ORDER BY `id` ASC LIMIT 500;