pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.05k stars 5.83k forks source link

SELECT .. LIMIT x,y is slow when x is large #9570

Closed ericisme closed 4 years ago

ericisme commented 5 years ago

2M rows, query 7 or 8 seconds for this sql.

SELECT * FROM order_order ORDER BY id DESC LIMIT 2103100, 100

what's the reason it show? and is there any methord to speed up? thank you

ericisme commented 5 years ago

i thank every range mark down it's total count in pd/meta data, and use rang count(meta data) to calc the being index in which range, it will be fast?

tiancaiamao commented 5 years ago
ORDER BY id DESC LIMIT 2103100, 100

This operation scans 2103100 + 100 records, that why it is slow. Paging using offset+limit count would be more and more slower with the offset increasing.

Using index and query an index range would be fast. If id is primary:

select * from order_order order by id desc where id > 2103100 and id < 2103200

@ericisme

ericisme commented 5 years ago

@tiancaiamao id has index, but some row may deleted, making id not continuous

morgo commented 5 years ago

One easy way you can speed this up, is to remember the last value of id. This is why some websites only show a 'next' button and not all the pages in advance. Then the query just becomes:

SELECT * FROM order_order WHERE id > $last_value ORDER BY id;

Writing a query with a large LIMIT will be similarly slow in MySQL. As you note, because there could be deleted rows, the database needs to scan through all the rows.

XuHuaiyu commented 5 years ago

@ericisme Hi, what's the result of select tidb_version()?

ericisme commented 5 years ago

@morgo thank you @XuHuaiyu that is the result of select tidb_version():

Release Version: v2.1.4 Git Commit Hash: a1d10c17039ec289ef490481b25c7091dba7ac31 Git Branch: release-2.1 UTC Build Time: 2019-02-15 06:24:33 GoVersion: go version go1.11.2 linux/amd64 Race Enabled: false TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e Check Table Before Drop: false

XuHuaiyu commented 5 years ago

May you provide us:

  1. The schema of order_order.
  2. The result of:
    explain analyze SELECT * FROM order_order ORDER BY id DESC LIMIT 2103100, 100;
ericisme commented 5 years ago

@XuHuaiyu

CREATE TABLE order_order (
id bigint(20) NOT NULL AUTO_INCREMENT,
createTime datetime DEFAULT NULL,
updateTime datetime DEFAULT NULL,
version int(11) DEFAULT NULL,
amazonOrderId varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
batchNo varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
cbaDisplayAbleshippinglabel varchar(255) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
earliestDeliveryDate datetime DEFAULT NULL,
earliestShipDate datetime DEFAULT NULL,
fulfillmentChannel varchar(255) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
isBusinessOrder bit(1) DEFAULT NULL,
isPremiumOrder bit(1) DEFAULT NULL,
isPrime bit(1) DEFAULT NULL,
isReplacementOrder bit(1) DEFAULT NULL,
lastUpdateDate datetime DEFAULT NULL,
latestDeliveryDate datetime DEFAULT NULL,
latestShipDate datetime DEFAULT NULL,
marketPlaceId varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
numberOfItemsShipped int(11) DEFAULT NULL,
numberOfItemsUnshipped int(11) DEFAULT NULL,
orderChannel varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
orderClasses int(11) DEFAULT NULL,
orderStatus varchar(255) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
orderTotal double DEFAULT NULL,
orderTotalCurrency varchar(20) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
orderType varchar(255) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
paymentExecutionDetail varchar(200) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
paymentMethod varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
productProblem text CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
purchaseDate datetime DEFAULT NULL,
purchaseOrderNumber varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
replacedOrderId varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
salesChannel varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
sellerOrderId varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
shipmentServiceLevelCategory varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
shippedByAmazonTFM varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
shipsServiceLevel varchar(100) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
status int(11) DEFAULT NULL,
syncShipmentFlag int(11) DEFAULT NULL,
syncShipmentMsg varchar(255) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
tfmShipmentStatus varchar(255) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
channelAccount_id bigint(20) DEFAULT NULL,
customerInfo_id bigint(20) DEFAULT NULL,
shippingAddress_id bigint(20) DEFAULT NULL,
sourceLine_id bigint(20) DEFAULT NULL,
buyerId varchar(255) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
purchaseDateStr varchar(50) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
lastUpdateDateStr varchar(50) CHARSET utf8 COLLATE utf8_bin DEFAULT NULL,
buyerEmail varchar(100) CHARSET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
orderDataType varchar(50) CHARSET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
PRIMARY KEY (id),
UNIQUE KEY index_order_amazon_order_id_channel_account_id (amazonOrderId,channelAccount_id),
KEY FKjcnnrk2ion4jis4iujumflnhb (channelAccount_id),
KEY FK9ky5encccri0adv15yfh4ifq1 (customerInfo_id),
KEY FK441d5oykb0fjb59xor74vft5i (shippingAddress_id),
KEY FK72t8fsk0fs8okx54tgipdtfhi (sourceLine_id),
KEY purchaseDate (purchaseDate),
KEY orderStatus (orderStatus),
KEY index_buyerid (buyerId),
KEY order_order_latestShipDate (latestShipDate),
KEY index_amazonOrderAndChaId (amazonOrderId,channelAccount_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=2252543;
TopN_9  100.00  root    erp.order_order.id:desc, offset:2103100, count:100  time:14.038471983s, loops:5, rows:100
└─TableReader_17    2163136.00  root    data:TableScan_16   time:9.288636485s, loops:65730, rows:2103297
└─TableScan_16  2163136.00  cop table:order_order, range:[-inf,+inf], keep order:false

i guest that may cache every range total count in to tivk/pd, then just use the count information to calcalte the targe range id instead of scan all the id, Do you think this is feasible?

ericisme commented 5 years ago

is that an atom counter will slow down observerly the TPS(insert,delete) withing 64MB range?

XuHuaiyu commented 5 years ago

i guest that may cache every range total count in to tivk/pd, then just use the count information to calcalte the targe range id instead of scan all the id, Do you think this is feasible?

This is not that reasonable, we do not really know how many key-values we need to scan because of MVCC. We only know the start-key-value and end-key-value range which need to be scanned. https://github.com/pingcap/tidb/issues/9570#issuecomment-469938472 is a good solution since we'll build the scan range through the WHERE condition.

ghost commented 4 years ago

I am going to close this issue as stale. Please feel free to re-open if you have additional questions. Thanks!