Open windtalker opened 2 months ago
On derived columns, e.g. function call or sum(l_quantity) > 314
, optimizer currently just uses hard-coded value. The best we can do, at the moment, is changing to a smaller default value for derived columns.
stats feedback for optimizer is more feasible to resolve this kind of issue.
Enhancement
Looks like TiDB does not support estimate filter selectively on derived column, which may cause some sub-optimal plans. Take tpch query 18 as example:
There is a very large deviation for the estimate of
Selection_76
, which issum(l_quantity) > 314
. The actually result is732
and the estimate result is120704204.80
. This miscalculation resulted in the subsequent plans not being optimal. For example,HashJoin_72
can be broadcast join by broadcast the result ofSelection_76
, and forHashJoin_68
, it can also be broadcast join and useProjection_96
as the build side. AndHashJoin_210
can be broadcast join so there is no need to exchange the entire lineitem table.