Mysql 大数据量分页讨论

讨论的前提

单机mysql，innodb
自增主键，step为 1
无分表分库
数据库环境，2016 mbp(2 GHz Intel Core i5, 8 GB, SSD), mysql 版本8.0.18
不讨论联合索引问题
快速、精确和实现简单，三者永远只能满足其二，必须舍掉其中一个

表结构

创建表

CREATE TABLE `history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`txt` varchar(100) DEFAULT NULL,
`create_time` datetime NOT NULL,
`is_delete` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
KEY `create_time_index` (`create_time`),
KEY `is_delete_index` (`is_delete`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

添加10000000记录

+----+-------------------------+---------------------+-----------+
| id | txt                     | create_time         | is_delete |
+----+-------------------------+---------------------+-----------+
|  1 | 2021-11-14 00:00:00,0,0 | 2019-12-17 00:00:00 |         0 |
|  2 | 2019-01-23 00:00:00,0,1 | 2019-12-17 00:00:00 |         0 |
|  3 | 2019-09-06 00:00:00,0,2 | 2019-12-17 00:00:00 |         0 |
|  4 | 2020-01-06 00:00:00,0,3 | 2019-12-17 00:00:00 |         0 |
|  5 | 2020-05-18 00:00:00,0,4 | 2019-12-17 00:00:00 |         0 |
|  6 | 2019-05-24 00:00:00,0,5 | 2019-12-17 00:00:00 |         0 |
|  7 | 2021-07-26 00:00:00,0,6 | 2019-12-17 00:00:00 |         0 |
|  8 | 2019-02-20 00:00:00,0,7 | 2019-12-17 00:00:00 |         0 |
|  9 | 2020-11-13 00:00:00,0,8 | 2019-12-17 00:00:00 |         0 |
| 10 | 2019-03-16 00:00:00,0,9 | 2019-12-17 00:00:00 |         0 |
+----+-------------------------+---------------------+-----------+

分页sql

select * from history where is_delete=0 order by id limit 5000000, 20;

explain

+----+-------------+---------+------------+------+-----------------+-----------------+---------+-------+---------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys   | key             | key_len | ref   | rows    | filtered | Extra       |
+----+-------------+---------+------------+------+-----------------+-----------------+---------+-------+---------+----------+-------------+
|  1 | SIMPLE      | history | NULL       | ref  | is_delete_index | is_delete_index | 1       | const | 4878971 |   100.00 | Using index |
+----+-------------+---------+------------+------+-----------------+-----------------+---------+-------+---------+----------+-------------+

为什么会慢

因为随着offset(偏移量)的增加,mysql无法跳过offset的行,需要花大量时间来扫描那些需要丢弃的数据。

延迟关联

《高性能MySQL》中提了一个方法这样改写这条查询sql

select * from history h 
    inner join (select id from history where is_delete=0 order by id limit 5000000, 20) h2 
    on h.id=h2.id;

explain

+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+---------+----------+-------------+
| id | select_type | table      | partitions | type   | possible_keys   | key             | key_len | ref   | rows    | filtered | Extra       |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+---------+----------+-------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL            | NULL            | NULL    | NULL  | 4878971 |   100.00 | NULL        |
|  1 | PRIMARY     | h          | NULL       | eq_ref | PRIMARY         | PRIMARY         | 4       | h2.id |       1 |   100.00 | NULL        |
|  2 | DERIVED     | history    | NULL       | ref    | is_delete_index | is_delete_index | 1       | const | 4878971 |   100.00 | Using index |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+---------+----------+-------------+

通过覆盖索引查询查出需要的主键，再根据主键关联原表获取需要的行，减少了扫描的行和数据量。

业务只允许查询offset较小的数据

限制用户能够翻页的数量，多数情况下对用户体验影响不大。
如果用户想要翻阅比较靠后的数据，就让用户添加额外的过滤条件，如时间。尽量缩小offset，也就是减小mysql扫描行数。

后台查询历史记录，可以等不在意时长

愿意等

其他

在递增主键，且数据不会被删除的情况下，可以考虑，使用递增主键。但要求比较苛刻，使用范围比较狭窄

ref: 《高性能Mysql 第三版》5.4.3节 table:derived2 解释

kagxin / blog

Mysql 大数据量分页讨论 #56