Open Rustin170506 opened 4 months ago
How do we deal with the outdated stats?
tiup playground nightly --db.binpath /Volumes/t7/code/tidb/bin/tidb-server
CREATE TABLE example_table
(
id INT PRIMARY KEY,
name VARCHAR(255),
age INT
);
INSERT INTO example_table (id, name, age) VALUES (1, 'Alice', 30), (2, 'Bob', 25), (3, 'Charlie', 35), (4, 'Diana', 28), (5, 'Evan', 40), (6, 'Fiona', 22), (7, 'George', 33), (8, 'Hannah', 29), (9, 'Ian', 31), (10, 'Jenny', 27);
3. Wait for the stats meta:
```sql
SELECT * FROM mysql.stats_meta;
+------------------+--------+------------+-----+--------+
|version |table_id|modify_count|count|snapshot|
+------------------+--------+------------+-----+--------+
|451225860582932481|104 |10 |10 |0 |
+------------------+--------+------------+-----+--------+
set global tidb_persist_analyze_options='OFF'
ANALYZE TABLE example_table ALL COLUMNS
SELECT * FROM mysql.stats_meta;
+------------------+--------+------------+-----+------------------+
|version |table_id|modify_count|count|snapshot |
+------------------+--------+------------+-----+------------------+
|451225906812289034|104 |0 |10 |451225906799181830|
+------------------+--------+------------+-----+------------------+
+--------+--------+-------+--------------+----------+------------+------------+------------------+---------+---------+----+--------------------+----------------+
|table_id|is_index|hist_id|distinct_count|null_count|tot_col_size|modify_count|version |cm_sketch|stats_ver|flag|correlation |last_analyze_pos|
SELECT * FROM mysql.stats_histograms; +--------+--------+-------+--------------+----------+------------+------------+------------------+---------+---------+----+--------------------+----------------+ |104 |0 |1 |10 |0 |80 |0 |451225906812289034|null |2 |1 |1 |null | |104 |0 |2 |10 |0 |59 |0 |451225906812289034|null |2 |1 |1 |null | |104 |0 |3 |10 |0 |80 |0 |451225906812289034|null |2 |1 |-0.05454545454545454|null | +--------+--------+-------+--------------+----------+------------+------------+------------------+---------+---------+----+--------------------+----------------+
7. Insert more data:
```sql
INSERT INTO example_table (id, name, age)
VALUES (1, 'Alice', 30),
(2, 'Bob', 25),
(3, 'Charlie', 35),
(4, 'Diana', 28),
(5, 'Evan', 40),
(6, 'Fiona', 22),
(7, 'George', 33),
(8, 'Hannah', 29),
(9, 'Ian', 31),
(10, 'Jenny', 27);
SELECT * FROM example_table WHERE name = '';
SELECT * FROM mysql.column_stats_usage;
+--------+---------+-------------------+-------------------+
|table_id|column_id|last_used_at |last_analyzed_at |
+--------+---------+-------------------+-------------------+
|104 |1 |null |2024-07-18 16:10:32|
|104 |2 |2024-07-18 16:14:19|2024-07-18 16:10:32|
|104 |3 |null |2024-07-18 16:10:32|
+--------+---------+-------------------+-------------------+
ANALYZE TABLE example_table;
SELECT * FROM mysql.stats_histograms;
+--------+--------+-------+--------------+----------+------------+------------+------------------+---------+---------+----+--------------------+----------------+
|table_id|is_index|hist_id|distinct_count|null_count|tot_col_size|modify_count|version |cm_sketch|stats_ver|flag|correlation |last_analyze_pos|
+--------+--------+-------+--------------+----------+------------+------------+------------------+---------+---------+----+--------------------+----------------+
|104 |0 |3 |10 |0 |80 |0 |451225906812289034|null |2 |1 |-0.05454545454545454|null |
|104 |0 |1 |10 |0 |80 |0 |451226051173679118|null |2 |1 |1 |null |
|104 |0 |2 |10 |0 |59 |0 |451226051173679118|null |2 |1 |1 |null |
+--------+--------+-------+--------------+----------+------------+------------+------------------+---------+---------+----+--------------------+----------------+
As you can see column 3
still retains the old stats.
SELECT * FROM mysql.stats_top_n WHERE hist_id =3;
+--------+--------+-------+--------------------+-----+
|table_id|is_index|hist_id|value |count|
+--------+--------+-------+--------------------+-----+
|104 |0 |3 |0x038000000000000028|1 |
|104 |0 |3 |0x038000000000000023|1 |
|104 |0 |3 |0x038000000000000021|1 |
|104 |0 |3 |0x03800000000000001F|1 |
|104 |0 |3 |0x03800000000000001E|1 |
|104 |0 |3 |0x03800000000000001D|1 |
|104 |0 |3 |0x03800000000000001C|1 |
|104 |0 |3 |0x03800000000000001B|1 |
|104 |0 |3 |0x038000000000000019|1 |
|104 |0 |3 |0x038000000000000016|1 |
+--------+--------+-------+--------------------+-----+
+--------+--------+-------+--------------------+-----+
|table_id|is_index|hist_id|value |count|
+--------+--------+-------+--------------------+-----+
|104 |0 |3 |0x038000000000000028|1 |
|104 |0 |3 |0x038000000000000023|1 |
|104 |0 |3 |0x038000000000000021|1 |
|104 |0 |3 |0x03800000000000001F|1 |
|104 |0 |3 |0x03800000000000001E|1 |
|104 |0 |3 |0x03800000000000001D|1 |
|104 |0 |3 |0x03800000000000001C|1 |
|104 |0 |3 |0x03800000000000001B|1 |
|104 |0 |3 |0x038000000000000019|1 |
|104 |0 |3 |0x038000000000000016|1 |
+--------+--------+-------+--------------------+-----+
Feature Request
Is your feature request related to a problem? Please describe: The ANALYZE statement would collect the statistics of all columns currently. If the table is big and wide, executing
ANALYZE
would consume lots of time, memory, and CPU. See https://github.com/pingcap/tidb/issues/27358 for details. However, only the statistics of some columns are used in creating query plans, while the statistics of others are not. Predicate columns are those columns whose statistics are used in query plans, usually in where conditions, join conditions, and so on. If ANALYZE only collects statistics for predicate columns and indexed columns (statistics of indexed columns are important for index selection), the cost of ANALYZE can be reduced.Describe the feature you'd like: See: https://github.com/pingcap/tidb/pull/53511 Describe alternatives you've considered: None Teachability, Documentation, Adoption, Migration Strategy: None