pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/
https://pingcap.com
Apache License 2.0
37.06k stars 5.83k forks source link

stats_histograms.tot_col_size is negative after analyzing table with index containing virtual column #29126

Open xuyifangreeneyes opened 2 years ago

xuyifangreeneyes commented 2 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

use test;
drop table if exists t;
set @@tidb_analyze_version = 2;
create table t (a int, b int, c int as (a+1), index idx(c));
insert into t (a,b) values (1,1), (2,2), (3,3), (4,4), (4,5), (5,6), (5,7), (5,8), (null,null);
analyze table t with 2 topn, 2 buckets;
select * from mysql.stats_histograms

2. What did you expect to see? (Required)

In mysql.stats_histograms, tot_col_size of idx is positive.

3. What did you see instead (Required)

tot_col_size of idx is negative.

mysql> select * from mysql.stats_histograms;
+----------+----------+---------+----------------+------------+--------------+--------------+--------------------+----------------------+-----------+------+-------------+------------------------------------+
| table_id | is_index | hist_id | distinct_count | null_count | tot_col_size | modify_count | version            | cm_sketch            | stats_ver | flag | correlation | last_analyze_pos                   |
+----------+----------+---------+----------------+------------+--------------+--------------+--------------------+----------------------+-----------+------+-------------+------------------------------------+
|       57 |        0 |       3 |              0 |          0 |            0 |            0 | 428669461304967168 | NULL                 |         0 |    0 |           0 | NULL                               |
|       57 |        0 |       1 |              5 |          1 |            8 |            0 | 428669466211254272 | NULL                 |         2 |    1 |           1 | 0x33                               |
|       57 |        0 |       2 |              8 |          1 |            8 |            0 | 428669466211254272 | NULL                 |         2 |    1 |           1 | 0x38                               |
|       57 |        1 |       1 |              5 |          1 |           -9 |            0 | 428669466211254272 | NULL                 |         2 |    1 |           0 | 0x038000000000000004               |
+----------+----------+---------+----------------+------------+--------------+--------------+--------------------+----------------------+-----------+------+-------------+------------------------------------+
4 rows in set (0.00 sec)

4. What is your TiDB version? (Required)

mysql> SELECT tidb_version();
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version()                                                                                                                                                                                                                                                                                                             |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: v5.3.0-alpha-1204-g21b008864
Edition: Community
Git Commit Hash: 21b008864a5440191c682f5154b14d97b1070e67
Git Branch: master
UTC Build Time: 2021-10-26 10:29:02
GoVersion: go1.16.4
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
xuyifangreeneyes commented 2 years ago

We use the size of the virtual column value minus one to calculate tot_col_size of virtual column index (see (*baseCollector).collectColumnGroups). Since the virtual column value is empty, tot_col_size of virtual column index is calculated as -count.

I tried to get tot_col_size of virtual column index in (*AnalyzeColumnsExec).handleNDVForSpecialIndexes but we don't maintain tot_col_size for index for analyze version 1.

If projection can be pushed down to TiKV, we can calculate virtual column on TiKV and calculating tot_col_size of virtual column index becomes trivial.

Notice that we don't use tot_col_size of index currently. Besides, the standard way to check histograms is show stats_histograms instead of select * from mysql.stats_histograms. show stats_histograms set avg_col_size to 0 for each index, which is expected. Therefore, maybe we can change the severity from moderate to minor.