pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.88k stars 5.8k forks source link

Resource units (RUs) used should be computable from the output of `EXPLAIN` and `EXPLAIN ANALYZE` #47269

Closed adamf-db closed 11 months ago

adamf-db commented 11 months ago

Feature Request

Is your feature request related to a problem? Please describe: I'd like to be know how the RUs reported by EXPLAIN and EXPLAIN ANALYZE where calculated. Right now, it's not possible to use the output of an EXPLAIN ANALYZE to determine why the RUs are what they are.

Describe the feature you'd like: The output of EXPLAIN ANALYZE... should include the counts for reads, batch reads, and CPU time. Ideally, this would be done for all operators including sub operators. This should also report the base cost values used.

Basically, I want to be able to do the calculations in tikv/pd/client/resource_manager/model.go & config.go but using the output of EXPLAIN ANALYZE (or estimated in EXPLAIN).

Describe alternatives you've considered: There's not really any other way.

Teachability, Documentation, Adoption, Migration Strategy: I think it's reasonable to want to understand the cost of a query you issue, and especially when you are trying to map something like QPS-> RUs -> resource groups. It's very hard to do that right now.

Examples of current output from EXPLAIN ANALYZE format = tidb_json with the executeInfo blocks also JSON:

EXPLAIN ANALYZE FORMAT = "tidb_json" select * from trips where trip_id = 5105991
{'accessObject': 'table:trips',
 'actRows': '1',
 'diskInfo': 'N/A',
 'estRows': '1.00',
 'executeInfo': 'time:2.55ms, loops:2, RU:0.329460, Get:{num_rpc:1, '
                'total_time:2.13ms}, total_process_time: 231.5µs, '
                'total_wait_time: 732.9µs, tikv_wall_time: 995.8µs, '
                'scan_detail: {total_process_keys: 1, total_process_keys_size: '
                '150, total_keys: 1, get_snapshot_time: 691.7µs, rocksdb: '
                '{block: {cache_hit_count: 2, read_count: 1, read_byte: 8.19 '
                'KB, read_time: 10.3µs}}}',
 'id': 'Point_Get_1',
 'memoryInfo': 'N/A',
 'operatorInfo': 'handle:11158051',
 'taskType': 'root'}
EXPLAIN ANALYZE FORMAT = "tidb_json" INSERT INTO ten_col (i1 , i2 , i3 , i4 , i5 , i6 , i7 {'actRows': '0',
 'diskInfo': 'N/A',
 'estRows': 'N/A',
 'executeInfo': 'time:71.8µs, loops:1, prepare: 33.3µs, insert:38.4µs, '
                'RU:1.416016',
 'id': 'Insert_1',
 'memoryInfo': '781 Bytes',
 'operatorInfo': 'N/A',
 'taskType': 'root'}

The below is a select that partially hits a composite index:

{'actRows': '1433',
 'diskInfo': 'N/A',
 'estRows': '9647.77',
 'executeInfo': 'time:5.09ms, loops:3, RU:4.476161, Concurrency:5',
 'id': 'Projection_4',
 'memoryInfo': '111.4 KB',
 'operatorInfo': 'bikeshare.adam_trips.bike_number',
 'subOperators': [{'actRows': '1433',
                   'diskInfo': 'N/A',
                   'estRows': '9647.77',
                   'executeInfo': {'cop_task': {'avg': '1.64ms',
                                                'build_task_duration': '7.08µs',
                                                'copr_cache_hit_ratio': '0.00',
                                                'max': '1.76ms',
                                                'max_distsql_concurrency': '1',
                                                'max_proc_keys': '729',
                                                'min': '1.56ms',
                                                'num': '3',
                                                'p95': '1.76ms',
                                                'p95_proc_keys': '729',
                                                'rpc_num': '3',
                                                'rpc_time': '4.87ms',
                                                'tot_proc': '1.07ms',
                                                'tot_wait': '113.2µs'},
                                   'loops': '3',
                                   'time': '5.03ms'},
                   'id': 'IndexReader_6',
                   'memoryInfo': '75.2 KB',
                   'operatorInfo': 'index:IndexRangeScan_5',
                   'subOperators': [{'accessObject': 'table:adam_trips, '
                                                     'index:start_end_bike_number(start_station, '
                                                     'end_station, '
                                                     'bike_number)',
                                     'actRows': '1433',
                                     'diskInfo': 'N/A',
                                     'estRows': '9647.77',
                                     'executeInfo': 'tikv_task:{proc max:0s, '
                                                    'min:0s, avg: 0s, p80:0s, '
                                                    'p95:0s, iters:12, '
                                                    'tasks:3}, scan_detail: '
                                                    '{total_process_keys: '
                                                    '1433, '
                                                    'total_process_keys_size: '
                                                    '220781, total_keys: 1436, '
                                                    'get_snapshot_time: '
                                                    '42.6µs, rocksdb: '
                                                    '{key_skipped_count: 1433, '
                                                    'block: {cache_hit_count: '
                                                    '10}}}',
                                     'id': 'IndexRangeScan_5',
                                     'memoryInfo': 'N/A',
                                     'operatorInfo': 'range:["11th & Kenyon St '
                                                     'NW" "18th & M St '
                                                     'NW","11th & Kenyon St '
                                                     'NW" "18th & M St NW"], '
                                                     'keep order:false',
                                     'taskType': 'cop[tikv]'}],
                   'taskType': 'root'}],
 'taskType': 'root'}
kevin-xianliu commented 11 months ago

@nolouch PTAL.

kevin-xianliu commented 11 months ago

We support using EXPLAIN ANALYZE to display RU statistical information: https://github.com/pingcap/tidb/pull/42080

adamf-db commented 11 months ago

We support using EXPLAIN ANALYZE to display RU statistical information: #42080

Yep - but you can't figure out how the RUs were computed from the explain plan. It's a black box - this issue is about adding the information needed to a) verify the RU computation, and b) see what factors (reads, batch reads, bytes read, writes, bytes written, CPU ms) make up the RU.

nolouch commented 11 months ago

@adamf-db Actually, the main factors are already displayed in executeInfo, two main factors are the total_process_keys_size and rpc_nums. What's your purpose in knowing the factors? only want to know if is it calculated as expected or want to optimize the query. We can optimize the content of this display. but does the user really need to know the details of the calculation to optimize SQL? We should already have detailed execution information and per query RU.

adamf-db commented 11 months ago

@nolouch I want to have a good mental model for future queries, and I want to be able to predict the impact of RGs/RU usage on my queries. I also want to just prove that the my QPS -> RU usage -> RG settings make sense.

How do total_process_keys_size and rpc_nums map to the RU calculations in model.go?

For example, given:

 'executeInfo': 'time:2.55ms, loops:2, RU:0.329460, Get:{num_rpc:1, '
                'total_time:2.13ms}, total_process_time: 231.5µs, '
                'total_wait_time: 732.9µs, tikv_wall_time: 995.8µs, '
                'scan_detail: {total_process_keys: 1, total_process_keys_size: '
                '150, total_keys: 1, get_snapshot_time: 691.7µs, rocksdb: '
                '{block: {cache_hit_count: 2, read_count: 1, read_byte: 8.19 '
                'KB, read_time: 10.3µs}}}',
 'id': 'Point_Get_1',

How do we end up with 0.329460 RUs?

nolouch commented 11 months ago

@adamf-db If you are using 7.1. x, the calculation is sum BeforeKVRequest() + AfterKVRequest() in model.go, that is :

before : 
     consumption.RRU += float64(kc.ReadBaseCost)  ->   kv.ReadBaseCost * rpc_nums  
after:  
      consumption.RRU += float64(kc.ReadBytesCost) * readBytes  -> kc.ReadBytesCost * total_process_keys_size 
      consumption.RRU += float64(kc.CPUMsCost) * kvCPUMs   -> kc.CPUMsCost * total_process_time 

and according to https://github.com/tikv/pd/blob/aeb259335644d65a97285d7e62b38e7e43c6ddca/client/resource_group/controller/config.go#L58C19-L67, it's a read request, and

    CPUMsCost  = (1/3) 
    ReadBytesCost =  1/ (64*1024) 
    ReadBaseCost  = 0.25

so RU is:

kv.ReadBaseCost * rpc_nums  +  kc.ReadBytesCost * total_process_keys_size  +  kc.CPUMsCost * total_process_time 
= 0.25 *1 + 1/ (64*1024) * 150 + (1/3) *  0.2315
≈ 0.329
adamf-db commented 11 months ago

Thanks!

How does the calculation work for the below plan?

{'actRows': '1433',
 'diskInfo': 'N/A',
 'estRows': '9647.77',
 'executeInfo': 'time:5.09ms, loops:3, RU:4.476161, Concurrency:5',
 'id': 'Projection_4',
 'memoryInfo': '111.4 KB',
 'operatorInfo': 'bikeshare.adam_trips.bike_number',
 'subOperators': [{'actRows': '1433',
                   'diskInfo': 'N/A',
                   'estRows': '9647.77',
                   'executeInfo': {'cop_task': {'avg': '1.64ms',
                                                'build_task_duration': '7.08µs',
                                                'copr_cache_hit_ratio': '0.00',
                                                'max': '1.76ms',
                                                'max_distsql_concurrency': '1',
                                                'max_proc_keys': '729',
                                                'min': '1.56ms',
                                                'num': '3',
                                                'p95': '1.76ms',
                                                'p95_proc_keys': '729',
                                                'rpc_num': '3',
                                                'rpc_time': '4.87ms',
                                                'tot_proc': '1.07ms',
                                                'tot_wait': '113.2µs'},
                                   'loops': '3',
                                   'time': '5.03ms'},
                   'id': 'IndexReader_6',
                   'memoryInfo': '75.2 KB',
                   'operatorInfo': 'index:IndexRangeScan_5',
                   'subOperators': [{'accessObject': 'table:adam_trips, '
                                                     'index:start_end_bike_number(start_station, '
                                                     'end_station, '
                                                     'bike_number)',
                                     'actRows': '1433',
                                     'diskInfo': 'N/A',
                                     'estRows': '9647.77',
                                     'executeInfo': 'tikv_task:{proc max:0s, '
                                                    'min:0s, avg: 0s, p80:0s, '
                                                    'p95:0s, iters:12, '
                                                    'tasks:3}, scan_detail: '
                                                    '{total_process_keys: '
                                                    '1433, '
                                                    'total_process_keys_size: '
                                                    '220781, total_keys: 1436, '
                                                    'get_snapshot_time: '
                                                    '42.6µs, rocksdb: '
                                                    '{key_skipped_count: 1433, '
                                                    'block: {cache_hit_count: '
                                                    '10}}}',
                                     'id': 'IndexRangeScan_5',
                                     'memoryInfo': 'N/A',
                                     'operatorInfo': 'range:["11th & Kenyon St '
                                                     'NW" "18th & M St '
                                                     'NW","11th & Kenyon St '
                                                     'NW" "18th & M St NW"], '
                                                     'keep order:false',
                                     'taskType': 'cop[tikv]'}],
                   'taskType': 'root'}],
 'taskType': 'root'}
nolouch commented 11 months ago

@adamf-db Same as the formula(7.1) above:

kv.ReadBaseCost * rpc_nums  +  kc.ReadBytesCost * total_process_keys_size  +  kc.CPUMsCost * total_process_time 
=  0.25 *3 +  1/ (64*1024) * 220781+ (1/3) * 1.07
≈ 4.476
adamf-db commented 11 months ago

Thanks! Opened https://github.com/pingcap/docs/pull/14956 to update the docs with this information.

adamf-db commented 11 months ago

Closing now that there's a PR for the docs to have this information.