Open darkelf21cn opened 4 years ago
Thanks for your feedback @darkelf21cn , I have reproduced this case on my computer. I will investigate this problem soon.
The reason is tricky and it is caused by a restriction on TiDB's optimizer.
Background: the optimizer requires the inner child of IndexJoin
must be a DataSource(TableScan/IndexLookUp/IndexScan)
, you can see this restriction here.
1. If we use job_result_json
here, TiDB can't push the expression job_result_json IS NOT
down to TiKV
since it doesn't support to this JSON function now.
2. Then the inner child of this JOIN
must be a Selection
which is used to handle this JSON function and it is not a DataSource
.
3. Then we cannot use IndexJoin
here since its inner child must be a DataSouce
but it is a Selection
now.
4. Since we use HashJoin
instead of IndexJoin
here, its inner side cannot know task_id
s in its outer side(this is determined by the different implementation of HashJoin
and IndexJoin
).
5. Since the inner side cannot get task_id
s, it cannot use the index, so the table scan is used here. @darkelf21cn
Thank you @qw4990 for your quick response. Do we have workaround for this? Actually I rewrote the query in different forms to intend to "enforcing" the SQL engine to execute the subquery first (the data set is small) to get the ids from tasks table, then use those id to lookup task_results. I failed because the query optimizer persists to use the same execution plan to execute it in an "optimized" way. So my question is that can we enforce execution order in some way? I know splitting the SQL into 2 can work that's not my first consideration.
I found one. The workaround is to create a generated stored column. The column will be used to identify the json column is null or not. E.g job_result_not_null TINYINT AS (CASE WHEN job_result_json IS NULL THEN 0 ELSE 1 END) STORED
. However, this requires the table to be recreated. So I'm still looking for better solutions.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Table tasks and task_results form one to many relations. Our query joins 2 tables to fetch the task_results by given reference_id and job_result can not be null. TiDB generates bad query plans when the data type of job_result is json. It decides to do a full table scan instead of getting the task_id first then seeks task_results by that id. VARCHAR type column does not have this problem.
Create tables and generate some data
CREATE TABLE tasks ( task_id int unsigned primary key auto_increment, task_name varchar(10), reference_id int unsigned, key(reference_id) ); CREATE TABLE task_results ( id int unsigned primary key auto_increment, task_id int unsigned, result_data json, job_result_json json, job_result_varchar VARCHAR(1024), creation_dt timestamp default current_timestamp, key (task_id) ); SET @@SESSION.tidb_batch_insert = 1; INSERT INTO tasks (task_name, reference_id) SELECT LEFT(UUID(), 10), FLOOR(RAND() 100000) from information_schema.columns a, information_schema.columns b LIMIT 100000; INSERT INTO task_results (task_id, result_data, job_result_json, job_result_varchar) SELECT task_id, CONCAT('{"key1": "', UUID(), '"}'), CASE WHEN FLOOR(RAND() 10) % 10 = 0 THEN NULL ELSE CONCAT('{"key2": "', UUID(), '"}') END, CASE WHEN FLOOR(RAND() 10) % 10 = 0 THEN NULL ELSE CONCAT('{"key2": "', UUID(), '"}') END FROM tasks, (SELECT FROM information_schema.columns LIMIT 10) a;
Bad query plan
EXPLAIN SELECT task_id, result_data FROM task_results WHERE task_id IN (select task_id from tasks where reference_id = 1 LIMIT 1) AND job_result_json IS NOT NULL ORDER BY creation_dt DESC LIMIT 0, 1000;
+--------------------------------+-----------+------+---------------------------------------------------------------------------------------+ | id | count | task | operator info | +--------------------------------+-----------+------+---------------------------------------------------------------------------------------+ | Projection_19 | 1.25 | root | test.task_results.task_id, test.task_results.result_data | | └─TopN_22 | 1.25 | root | test.task_results.creation_dt:desc, offset:0, count:1000 | | └─HashRightJoin_27 | 1.25 | root | inner join, inner:Limit_28, equal:[eq(test.tasks.task_id, test.task_results.task_id)] | | ├─Limit_28 | 1.00 | root | offset:0, count:1 | | │ └─IndexReader_33 | 1.00 | root | index:Limit_32 | | │ └─Limit_32 | 1.00 | cop | offset:0, count:1 | | │ └─IndexScan_31 | 1.00 | cop | table:tasks, index:reference_id, range:[1,1], keep order:false, stats:pseudo | | └─Selection_36 | 367632.00 | root | not(isnull(cast(test.task_results.job_result_json))) | | └─TableReader_39 | 459540.00 | root | data:Selection_38 | | └─Selection_38 | 459540.00 | cop | not(isnull(test.task_results.task_id)) | | └─TableScan_37 | 460000.00 | cop | table:task_results, range:[0,+inf], keep order:false, stats:pseudo | +--------------------------------+-----------+------+---------------------------------------------------------------------------------------+
Good query plan
EXPLAIN SELECT task_id, result_data FROM task_results WHERE task_id IN (select task_id from tasks where reference_id = 1 LIMIT 1) AND job_result_varchar IS NOT NULL ORDER BY creation_dt DESC LIMIT 0, 1000;
+--------------------------------+--------+------+------------------------------------------------------------------------------------------------------------------------------------------+ | id | count | task | operator info | +--------------------------------+--------+------+------------------------------------------------------------------------------------------------------------------------------------------+ | Projection_18 | 1.25 | root | test.task_results.task_id, test.task_results.result_data | | └─TopN_21 | 1.25 | root | test.task_results.creation_dt:desc, offset:0, count:1000 | | └─IndexJoin_30 | 1.25 | root | inner join, inner:IndexLookUp_29, outer key:test.tasks.task_id, inner key:test.task_results.task_id | | ├─Limit_33 | 1.00 | root | offset:0, count:1 | | │ └─IndexReader_38 | 1.00 | root | index:Limit_37 | | │ └─Limit_37 | 1.00 | cop | offset:0, count:1 | | │ └─IndexScan_36 | 1.00 | cop | table:tasks, index:reference_id, range:[1,1], keep order:false, stats:pseudo | | └─IndexLookUp_29 | 459.08 | root | | | ├─Selection_27 | 459.54 | cop | not(isnull(test.task_results.task_id)) | | │ └─IndexScan_25 | 460.00 | cop | table:task_results, index:task_id, range: decided by [eq(test.task_results.task_id, test.tasks.task_id)], keep order:false, stats:pseudo | | └─Selection_28 | 459.08 | cop | not(isnull(test.task_results.job_result_varchar)) | | └─TableScan_26 | 459.54 | cop | table:task_results, keep order:false, stats:pseudo | +--------------------------------+--------+------+------------------------------------------------------------------------------------------------------------------------------------------+
2. What did you expect to see? (Required)
IndexLookup on task_results
3. What did you see instead (Required)
TableScan on task_results
4. Affected version (Required)
v3.0.12
5. Root Cause Analysis