Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Explain the query-9 of chbenchmark.
select
n_name,
extract(
year
from
o_entry_d
) as l_year,
sum(ol_amount) as sum_profit
from
item,
stock,
supplier,
order_line,
orders,
nation
where
ol_i_id = s_i_id
and ol_supply_w_id = s_w_id
and mod(
(s_w_id * s_i_id),
10000
) = s_suppkey
and ol_w_id = o_w_id
and ol_d_id = o_d_id
and ol_o_id = o_id
and ol_i_id = i_id
and s_nationkey = n_nationkey
and i_data like '%BB'
group by
n_name,
extract(
year
from
o_entry_d
)
order by
n_name,
l_year desc;
2. What did you expect to see? (Required)
As you can see, there's ol_i_id = s_i_id, ol_i_id = i_id, and mod((s_w_id * s_i_id), 10000) = s_suppkey.
We can get s_i_id = i_id from the equivalence class propagation.
Then TiDB would use that to generate mod((s_w_id * i_id), 10000) = s_suppkey.
It's an equal function though one side is a scalar function. TiDB only accepts col_x = col_y as the join key. So TiDB would add a projection to project mod((s_w_id * i_id), 10000) = s_suppkey as Col#x. Then let mod((s_w_id * i_id), 10000) = s_suppkey become Col#x = s_suppkey.
So far the optimization sounds good. However, this projection cut the join tree into two parts, and unfortunately, the cut part is a cartesian product of table item and table stock. Finally, the SQL got a bad performance than we expected.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Explain the query-9 of chbenchmark.
2. What did you expect to see? (Required)
As you can see, there's
ol_i_id = s_i_id
,ol_i_id = i_id
, andmod((s_w_id * s_i_id), 10000) = s_suppkey
. We can gets_i_id = i_id
from the equivalence class propagation. Then TiDB would use that to generatemod((s_w_id * i_id), 10000) = s_suppkey
.It's an equal function though one side is a scalar function. TiDB only accepts
col_x = col_y
as the join key. So TiDB would add a projection to projectmod((s_w_id * i_id), 10000) = s_suppkey
asCol#x
. Then letmod((s_w_id * i_id), 10000) = s_suppkey
becomeCol#x = s_suppkey
.So far the optimization sounds good. However, this projection cut the join tree into two parts, and unfortunately, the cut part is a cartesian product of table
item
and tablestock
. Finally, the SQL got a bad performance than we expected.3. What did you see instead (Required)
No cartesian product in the final plan.
4. What is your TiDB version? (Required)
current master.