pingcap / tidb

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
https://pingcap.com
Apache License 2.0
37.34k stars 5.85k forks source link

planner: Prevent OR Expressions in IN Subquery Conversion to INNER JOIN to Avoid Data Duplication #57584

Open dash12653 opened 1 day ago

dash12653 commented 1 day ago

What problem does this PR solve?

Issue Number: close #57390

Problem Summary:

What changed and how does it work?

This is the plan:

tidb> explain 
SELECT c0, c1 
FROM t1 
WHERE (c1 OR c1) IN (SELECT c1 FROM t1 WHERE (c1 <= (0x991D3FA2F9C))) AND ((c0 AND 8.98447659672538e+29));
+--------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                             | estRows | task      | access object | operator info                                                                                                                                                                       |
+--------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| HashJoin_11                    | 2.66    | root      |               | semi join, equal:[eq(Column#7, Column#8)]                                                                                                                                           |
| ├─Projection_16(Build)         | 1.66    | root      |               | cast(fix_57390.t1.c1, double BINARY)->Column#8                                                                                                                                      |
| │ └─TableReader_19             | 1.66    | root      |               | data:Selection_18                                                                                                                                                                   |
| │   └─Selection_18             | 1.66    | cop[tikv] |               | le(fix_57390.t1.c1, "0x0991d3fa2f9c")                                                                                                                                               |
| │     └─TableFullScan_17       | 5.00    | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                                                                                                                      |
| └─Projection_12(Probe)         | 3.33    | root      |               | fix_57390.t1.c0, fix_57390.t1.c1, cast(or(istrue_with_null(cast(fix_57390.t1.c1, double BINARY)), istrue_with_null(cast(fix_57390.t1.c1, double BINARY))), double BINARY)->Column#7 |
|   └─TableReader_15             | 3.33    | root      |               | data:Selection_14                                                                                                                                                                   |
|     └─Selection_14             | 3.33    | cop[tikv] |               | fix_57390.t1.c0                                                                                                                                                                     |
|       └─TableFullScan_13       | 5.00    | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                                                                                                                      |
+--------------------------------+---------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

When tidb encounters an IN subquery, it tries to rewrite the outer table and subquery as INNER JOIN.

https://github.com/pingcap/tidb/blob/c091dba89718599dc1b3d3e45f9b0308a8d49ef0/pkg/planner/core/expression_rewriter.go#L1226-L1242

When rewriting OR expressions (e.g., (c1 OR c1)), the kernel will convert the type of c1 to DOUBLE.

https://github.com/pingcap/tidb/blob/c091dba89718599dc1b3d3e45f9b0308a8d49ef0/pkg/expression/builtin_cast.go#L2556-L2568

The join's condition is (c1 OR c1) = c1.

We can check the output of the join condition for the left and right tables after converting INNER JOIN.

left output:

tidb> select (c1 or c1) from t1;
+------------+
| (c1 or c1) |
+------------+
|          0 |
|          0 |
|          0 |
|          0 |
|          0 |
+------------+
5 rows in set, 10 warnings (0.00 sec)

right output:

tidb> SELECT cast(c1 as double) FROM t1 WHERE (c1 <= (0x991D3FA2F9C)) group by c1;
+--------------------+
| cast(c1 as double) |
+--------------------+
|                  0 |
|                  0 |
|                  0 |
+--------------------+
3 rows in set, 3 warnings (0.01 sec)

tidb> 

After the DISTINCT operation, CAST(c1 AS DOUBLE)actually causes data duplication, which will lead to unexpected results for in operation.

To address this, I add a restriction who will check if the left expression is or function in the conditions for converting to an INNER JOIN.

Check List

Tests

Side effects

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None
ti-chi-bot[bot] commented 1 day ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign winoros for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[pkg/planner/OWNERS](https://github.com/pingcap/tidb/blob/master/pkg/planner/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
ti-chi-bot[bot] commented 1 day ago

Hi @dash12653. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
tiprow[bot] commented 1 day ago

Hi @dash12653. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
hawkingrei commented 1 day ago

/ok-to-test

codecov[bot] commented 1 day ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.3545%. Comparing base (91c14a4) to head (35e8289). Report is 3 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #57584 +/- ## ================================================ + Coverage 72.8590% 73.3545% +0.4955% ================================================ Files 1676 1676 Lines 463653 463865 +212 ================================================ + Hits 337813 340266 +2453 + Misses 104979 102861 -2118 + Partials 20861 20738 -123 ``` | [Flag](https://app.codecov.io/gh/pingcap/tidb/pull/57584/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | Coverage Δ | | |---|---|---| | [integration](https://app.codecov.io/gh/pingcap/tidb/pull/57584/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | `43.4425% <50.0000%> (?)` | | | [unit](https://app.codecov.io/gh/pingcap/tidb/pull/57584/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | `72.1824% <100.0000%> (-0.0740%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap#carryforward-flags-in-the-pull-request-comment) to find out more. | [Components](https://app.codecov.io/gh/pingcap/tidb/pull/57584/components?src=pr&el=components&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | Coverage Δ | | |---|---|---| | [dumpling](https://app.codecov.io/gh/pingcap/tidb/pull/57584/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | `52.7673% <ø> (ø)` | | | [parser](https://app.codecov.io/gh/pingcap/tidb/pull/57584/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | `∅ <ø> (∅)` | | | [br](https://app.codecov.io/gh/pingcap/tidb/pull/57584/components?src=pr&el=component&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pingcap) | `45.4413% <ø> (-0.0091%)` | :arrow_down: |

🚨 Try these New Features: