pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.26k stars 5.84k forks source link

merging partition statistics to global statistics fails when some partition statistics are missing #38999

Open xuyifangreeneyes opened 2 years ago

xuyifangreeneyes commented 2 years ago

Enhancement

mysql> select @@tidb_partition_prune_mode;
+-----------------------------+
| @@tidb_partition_prune_mode |
+-----------------------------+
| dynamic                     |
+-----------------------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE t (a int) PARTITION BY RANGE( a ) (
    ->     PARTITION p0 VALUES LESS THAN (10),
    ->     PARTITION p1 VALUES LESS THAN (MAXVALUE)
    -> );
Query OK, 0 rows affected (0.04 sec)

mysql> insert into t values (1), (100);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into t select a + 1 from t where a > 10;
Query OK, 1 row affected, 1 warning (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 1

-- repeat insert ... select ... to make the row count of p1 exceed 1000

-- wait 1 min to dump stats delta  
mysql> show stats_meta;
+---------+------------+----------------+---------------------+--------------+-----------+
| Db_name | Table_name | Partition_name | Update_time         | Modify_count | Row_count |
+---------+------------+----------------+---------------------+--------------+-----------+
| test    | t          | global         | 2022-11-09 11:08:01 |         8193 |      8193 |
| test    | t          | p0             | 2022-11-09 11:07:01 |            1 |         1 |
| test    | t          | p1             | 2022-11-09 11:08:01 |         8192 |      8192 |
+---------+------------+----------------+---------------------+--------------+-----------+
3 rows in set (0.00 sec)

Notice that we don't trigger auto analyze when row count is less then 1000. So p0 would not be auto analyzed and When auto analyze table t partition p1 is triggered, we see the following logs:

[2022/11/09 11:09:01.853 +08:00] [INFO] [update.go:1191] ["[stats] start to auto analyze"] [table=t] [partitions="[\"p1\"]"] ["analyze partition batch size"=1]
[2022/11/09 11:09:01.853 +08:00] [INFO] [update.go:1206] ["[stats] auto analyze triggered"] [table=t] [partitions="[\"p1\"]"]
[2022/11/09 11:09:01.895 +08:00] [INFO] [handle.go:1417] ["[stats] incrementally update modifyCount"] [tableID=72] [curModifyCnt=8192] [results.BaseModifyCnt=8192] [modifyCount=0]
[2022/11/09 11:09:01.895 +08:00] [INFO] [handle.go:1439] ["[stats] directly update count"] [tableID=72] [results.Count=8192] [count=8192]
[2022/11/09 11:09:01.901 +08:00] [INFO] [analyze.go:517] ["analyze table `test`.`t` has finished"] [partition=p1] ["job info"="auto analyze table all columns with 256 buckets, 500 topn, 1 samplerate"] ["start time"=2022/11/09 11:09:01.855 +08:00] ["end time"=2022/11/09 11:09:01.900 +08:00] [cost=44.783301ms]
[2022/11/09 11:09:01.902 +08:00] [INFO] [analyze_global_stats.go:72] ["MergePartitionStats2GlobalStatsByTableID start"]
[2022/11/09 11:09:01.906 +08:00] [INFO] [analyze_global_stats.go:76] ["MergePartitionStats2GlobalStatsByTableID end"] [error="[types:8244]Build table: `t` column: `a` global-level stats failed due to missing partition-level column stats, please run analyze table to refresh columns of all partitions"]

PS: the last two logs are added by me for debugging

logutil.BgLogger().Info("MergePartitionStats2GlobalStatsByTableID start")
globalStats, err := statsHandle.MergePartitionStats2GlobalStatsByTableID(e.ctx, globalOpts, e.ctx.GetInfoSchema().(infoschema.InfoSchema),
    globalStatsID.tableID, info.isIndex, info.histIDs,
    tableAllPartitionStats)
logutil.BgLogger().Info("MergePartitionStats2GlobalStatsByTableID end", zap.Error(err))

Since p0 doesn't have statistics, merging partition statistics to global statistics fails.

For a partition table, if the row count of some paritition is less than 1000, the partition would not be auto analyzed. When some other partitions are auto analyzed, global statistics cannot be merged successfully due to missing partition statistics. Besides, the fail reason of global statistics is added into warnings and we cannot see the warnings of auto analyze. In other word, we cannot see any error in logs when the case happens.

There are several things we need to do:

  1. When statistics of some partition have problems(for example, some partition has no statistics) and cannot participate in the procedure of merging partition statistics to global statistics, currently we just give up the procedure and global statistics cannot be updated. A more reasonable behavior is to exclude the partition whose statistics have problems and merge statistics of other partitions to global statistics.
  2. Add more logs about merging partition statistics to global statistics.
  3. Enhance auto analyze mechanism. Maybe we should trigger auto analyze for the tables whose row count is less than 1000.
seiya-annie commented 3 months ago

/report customer