pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
36.8k stars 5.8k forks source link

" tidb explain" Do not understand, do not understand #9017

Closed shenlijungg closed 5 years ago

shenlijungg commented 5 years ago

Question

select version() 5.7.10-TiDB-v2.1.0-5-g4dad722

sql:

explain SELECT c.f_number AS cNumber

FROM xxxxx c left join

yyyyy entry on c.f_number = entry.f_number left join

zzzz cus on c.f_number = cus.f_number where c.f_channel_id = '30010004'

tidb explain image

mysql explain image

shenlijungg commented 5 years ago

How to make efficient use of tidb multi-table index?

Do you have detailed documentation?

morgo commented 5 years ago

The EXPLAIN output format is described in this manual page. In addition, I can recommend taking a look at TiDB Academy, where examples walk through EXPLAIN on a sample data set.

I hope this helps. Please let me know if you have any further questions!

shenli commented 5 years ago

@shenlijungg Thanks for your feedback! I can not get your point.

shenlijungg commented 5 years ago

@shenli use tidb TABLE join too slow,And I don't understand tidb explain.

Mysql explain looks simple. I don't know how to optimize TIDB SQL!

morgo commented 5 years ago

@shenlijungg can you please paste the output of:

SHOW CREATE TABLE dzqd_order_main;
SHOW CREATE TABLE dzqd_order_entry;
SHOW CREATE TABLE dzqd_order_cus;

Thank you!

shenlijungg commented 5 years ago

@morgo

XuHuaiyu commented 5 years ago

hi, @shenlijungg what do you mean by multi-table index

shenlijungg commented 5 years ago

@XuHuaiyu 多个表的join关联 ,如何更好的走索引


How to optimize the sql for choosing a better index in TiDB?

morgo commented 5 years ago

Without any data loaded, TiDB will produce a similar execution plan to MySQL:

mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.7.10-TiDB-v2.1.1 |
+--------------------+
1 row in set (0.00 sec)

+------------------------------+-------+------+----------------------------------------------------------------------------------------------------------------------------------+
| id                           | count | task | operator info                                                                                                                    |
+------------------------------+-------+------+----------------------------------------------------------------------------------------------------------------------------------+
| Projection_8                 | 15.62 | root | c.f_number                                                                                                                       |
| └─IndexJoin_14               | 15.62 | root | left outer join, inner:IndexReader_13, outer key:c.f_number, inner key:cus.f_number                                              |
|   ├─IndexJoin_74             | 12.50 | root | left outer join, inner:IndexReader_73, outer key:c.f_number, inner key:entry.f_number                                            |
|   │ ├─IndexLookUp_81         | 10.00 | root |                                                                                                                                  |
|   │ │ ├─IndexScan_79         | 10.00 | cop  | table:c, index:f_channel_id, range:["30010004","30010004"], keep order:false, stats:pseudo                                       |
|   │ │ └─TableScan_80         | 10.00 | cop  | table:dzqd_order_main, keep order:false, stats:pseudo                                                                            |
|   │ └─IndexReader_73         | 10.00 | root | index:IndexScan_72                                                                                                               |
|   │   └─IndexScan_72         | 10.00 | cop  | table:entry, index:f_number, range: decided by [c.f_number], keep order:false, stats:pseudo                                      |
|   └─IndexReader_13           | 10.00 | root | index:IndexScan_12                                                                                                               |
|     └─IndexScan_12           | 10.00 | cop  | table:cus, index:f_number, f_cus_Name, f_cus_ZJ_Num, f_cus_CRMID, range: decided by [c.f_number], keep order:false, stats:pseudo |
+------------------------------+-------+------+----------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)

An explanation:

Your query seems to be producing a MergeJoin, which suggests the statistics might be different. I am going to get you to try two things:

  1. Force the join to be a nested loop and see if performance is restored:

    explain analyze SELECT /*+ TIDB_SMJ(c, entry, cus) */ c.f_number AS cNumber  FROM dzqd_order_main c left join  dzqd_order_entry entry on c.f_number = entry.f_number left join  dzqd_order_cus cus on c.f_number = cus.f_number where c.f_channel_id = '30010004';
    explain analyze SELECT /*+ TIDB_INLJ(c, entry, cus) */ c.f_number AS cNumber  FROM dzqd_order_main c left join  dzqd_order_entry entry on c.f_number = entry.f_number left join  dzqd_order_cus cus on c.f_number = cus.f_number where c.f_channel_id = '30010004';

    (I've changed it to EXPLAIN ANALYZE to also capture execution time).

  2. Get you to upload the statistics, and we can take a look and see if this is expected:

http://${tidb-server-ip}:${tidb-server-status-port}/stats/dump/${db_name}/${table_name}

XuHuaiyu commented 5 years ago

It looks like TiDB chooses the same access path as MySQL for table c, cus and entry.

shenlijungg commented 5 years ago

hi @morgo

Tidb dump log to send your gmail!!!

XuHuaiyu commented 5 years ago

Will it be possible to upload the dump file for statistics in this issue?

morgo commented 5 years ago

@XuHuaiyu I will upload them internally for you.

@shenlijungg I took a look at your stats, and with TiDB 2.1 and our current master branch it will merge join as it did in your example. I just need one more thing from you. Can you run EXPLAIN ANALYZE so we can compare the performance numbers of merge join versus the mysql-like behavior of nested loop join:

explain analyze SELECT /*+ TIDB_SMJ(c, entry, cus) */ c.f_number AS cNumber  FROM dzqd_order_main c left join  dzqd_order_entry entry on c.f_number = entry.f_number left join  dzqd_order_cus cus on c.f_number = cus.f_number where c.f_channel_id = '30010004';

explain analyze SELECT /*+ TIDB_INLJ(c, entry, cus) */ c.f_number AS cNumber  FROM dzqd_order_main c left join  dzqd_order_entry entry on c.f_number = entry.f_number left join  dzqd_order_cus cus on c.f_number = cus.f_number where c.f_channel_id = '30010004';

Thanks in advance!

shenlijungg commented 5 years ago

@morgo @XuHuaiyu TIDB_INLJ: image TIDB_SMJ: image

XuHuaiyu commented 5 years ago

Hi, @shenlijungg What's the results of :

explain analyze SELECT /*+  TIDB_HJ(cus) */ c.f_number AS cNumber  FROM dzqd_order_main c left join  dzqd_order_entry entry on c.f_number = entry.f_number left join  dzqd_order_cus cus on c.f_number = cus.f_number where c.f_channel_id = '30010004';

explain analyze SELECT c.f_number AS cNumber  FROM dzqd_order_main c left join  dzqd_order_entry entry on c.f_number = entry.f_number left join  dzqd_order_cus cus on c.f_number = cus.f_number where c.f_channel_id = '30010004';

select count(*) from dzqd_order_main where f_channel_id = '30010004';
select count(*) from dzqd_order_entry;
select count(*) from dzqd_order_cus;
shenlijungg commented 5 years ago

@XuHuaiyu EXPLAIN ANALYZE TIDB_HJ(CUS): image

EXPLAIN ANALYZE: image select count(*) from xxxx where f_channel_id = '30010004'; '176907'

yyy; '1381775' zzzz; '455050'

shenlijungg commented 5 years ago

@XuHuaiyu @morgo Can you look at issue for me?

XuHuaiyu commented 5 years ago

@shenlijungg

The HashJoin execution plan seems to be the best plan in TiDB. How long does the sql take in TiDB and MySQL respectively.

shenlijungg commented 5 years ago

@XuHuaiyu tidb image mysql: image

XuHuaiyu commented 5 years ago

Most of the time was consumed during fetching data in TiDB, but the consumption is still longer than expected. What's the machine configuration and the network configuration of your TiDB cluster?

shenlijungg commented 5 years ago

network configuration is 10MB ,Bandwidth increased to 100MB NOW! Is it enough? machine configuration:

8C 31G 500G tidb1 ,tikv1 no ssd! 8C 31G 531G tidb2 ,tikv2 no ssd! 8C 16G 331G tidb3 ,tikv3 no ssd! 16C 31G 531G ,tikv4 no ssd! 16C 31G 531G ,tikv5 no ssd! 16C 31G 531G ,tikv6 no ssd! 8C 15G 331G pd1 no ssd! 8C 15G 231G pd2 no ssd! 8C 15G 231G pd3 no ssd!

shenlijungg commented 5 years ago

@XuHuaiyu

XuHuaiyu commented 5 years ago

Hi, @shenlijungg The dataset and cluster configuration is enough to test the function of TiDB, but if you want to test the performance of TiDB. The size of your test dataset is too small to exploit the advantage of TiDB. MySQL fetches data from the local but TiDB fetches data through the network, the difference of the comparison results are expected. We suggest you construct a dataset with at least 50,000,000 rows for every table, and use the recommended configuration as we mentioned here.

shenlijungg commented 5 years ago

@XuHuaiyu thank you very mach!!!!