sundy-li / databend

FuseQuery is a Distributed SQL Query Engine at scale
Other
2 stars 0 forks source link

Bug: subquery in partition by clause return wrong result #83

Open sundy-li opened 2 weeks ago

sundy-li commented 2 weeks ago

Summary

mysql> insert into t1 values(1), (2), (2), (3);
Query OK, 4 rows affected (0.38 sec)

mysql> insert into t2 values(1), (2), (3);
Query OK, 3 rows affected (0.25 sec)

mysql> select count(distinct a) over (partition by (select b from t2)) from t1;
+-----------+
| count(_1) |
+-----------+
|         3 |
+-----------+
1 row in set (0.09 sec)
Read 4 rows, 17.00 B in 0.048 sec., 83.91 rows/sec., 356.61 B/sec.

We should throw errors for this query like PostgreSQL.

pg: error: more than one row returned by a subquery used as an expression

dosubot[bot] commented 2 weeks ago

The logic for handling subqueries in the PARTITION BY clause is implemented in the rewrite function within the src/query/sql/src/planner/optimizer/decorrelate/subquery_rewriter.rs file. This function processes the Window operator and handles the partition_by clause, ensuring that any subqueries are correctly rewritten [1].

To address the issue where a subquery in the PARTITION BY clause returns more than one row, you can add a check to ensure that the subquery returns only one row. This can be done by modifying the try_rewrite_subquery function to include a validation step.

Here is a relevant snippet from the resolve_subquery function in the TypeChecker struct, which checks for subqueries returning multiple rows in other parts of the query:

if (typ == SubqueryType::Scalar || typ == SubqueryType::Any)
    && output_context.columns.len() > 1
{
    return Err(ErrorCode::SemanticError(format!(
        "Subquery must return only one column, but got {} columns",
        output_context.columns.len()
    )));
}

You can implement a similar check in the try_rewrite_subquery function to ensure that subqueries in the PARTITION BY clause return only one row. This will help in throwing an error for queries like the one mentioned, similar to PostgreSQL's behavior [2].

Additionally, there are existing test cases for subqueries in the PARTITION BY clause located in the files tests/sqllogictests/suites/duckdb/subquery/window_in_subquery.test and tests/sqllogictests/suites/query/window_function/expr_in_window.test [3][4]. These can be used to validate the changes and ensure the correct behavior.

To continue talking to Dosu, mention @dosu.