pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.54k stars 5.75k forks source link

paging copr cannot hit copr cache #49232

Open guo-shaoge opened 7 months ago

guo-shaoge commented 7 months ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. load tpcc 1000 warehouse(tiup br:v7.1.0 restore db --db=tpcc --pd 127.0.0.1:13425 --storage s3://benchmark/ch-1k-v5 --s3.endpoint http://minio.pingcap.net:9000 --send-credentials-to-tikv=true --check-requirements=false)
  2. enable paging and run sql multiple times:
    set @@tidb_enable_paging=on;
    explain analyze select /*+ use_index(orders,idx_order) */ * from orders where o_w_id > 1 and o_d_id > 1 and o_c_id > 10 limit 10000;
  3. disable paging and run sql multiple times:
    set @@tidb_enable_paging=on;
    explain analyze select /*+ use_index(orders,idx_order) */ * from orders where o_w_id > 1 and o_d_id > 1 and o_c_id > 10 limit 10000;

2. What did you expect to see? (Required)

copr cache always hit

3. What did you see instead (Required)

copr cache not hit when paging is enabled: img_v3_025s_895b58b4-df9e-4afe-89f5-129513ee21fg

and copr cache is hit when paging is disabled: img_v3_025s_c9102385-1a41-485a-8d21-bc6bed70cb3g

4. What is your TiDB version? (Required)

MySQL [tpcc]> select tidb_version();
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                             |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v7.6.0-alpha-194-g0f978e9
Edition: Community
Git Commit Hash: 0f978e9849fb122077f9f7a57ea25142bd4f4107
Git Branch: debug_coprcache
UTC Build Time: 2023-12-06 13:06:44
GoVersion: go1.21.3
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.038 sec)
guo-shaoge commented 7 months ago
  1. copr resp will not insert into copr cache when the execution time of this copr req is less than 5ms. Check here to see more conditions
  2. The coprocessor logic compare the execution time of index scan with 5ms to see if the copr resp can insert into copr cache.(which is wrong, because we should check the execution time of Limit_13 with 5ms). Check here
  3. When paging is enabled, the execution time of index scan is very small, basically 200us+. So when paging is enabled, the copr cache cannot hit(because the result never insert into cache).
  4. When paging is disabled, the execution time of index scan is large, so it can insert into cache. So copr cache hits.
tiancaiamao commented 2 months ago

What you see from the explain analyze ... differs from what you can get from the coprocessor paging layer. @guo-shaoge

Basically you can view our network stack as follows:

distsql layer                            // request = select DAG / result = chunk result
coprocessor layer                        // abstract away from requested kv range =>  specific tikv node  (router using region cache)
grpc layer                               // remote RPC
tcp layer                                // streaming protocol
...

Each layer has their specific purpose, in coprocessor layer we hide the details of key range request to specific tikv node mapping. And distsql request is one of the application on this layer, where the request type = DAG.

In coprocessor layer, the data is opaque, it's a []byte. The upper layer, i.e. distsql layer is responsible for decoding it. Explan analyze works on the upper layer, it decode the data to get the time cost of each operators.

So we cannot get the time cost of each operator to decide whether or not cache one coprocessor result. That information is unavailable:

The coprocessor logic compare the execution time of index scan with 5ms to see if the copr resp can insert into copr cache.(which is wrong, because we should check the execution time of Limit_13 with 5ms). Check here