Open aaneja opened 1 year ago
Full call stack for easier debugging (from a related example) :
visitCall:143, ScalarStatsCalculator$RowExpressionStatsVisitor (com.facebook.presto.cost)
visitCall:104, ScalarStatsCalculator$RowExpressionStatsVisitor (com.facebook.presto.cost)
accept:131, CallExpression (com.facebook.presto.spi.relation)
calculate:96, ScalarStatsCalculator (com.facebook.presto.cost)
doCalculate:59, ProjectStatsRule (com.facebook.presto.cost)
doCalculate:30, ProjectStatsRule (com.facebook.presto.cost)
calculate:39, SimpleStatsRule (com.facebook.presto.cost)
calculateStats:80, ComposableStatsCalculator (com.facebook.presto.cost)
calculateStats:70, ComposableStatsCalculator (com.facebook.presto.cost)
calculateStats:79, HistoryBasedPlanStatisticsCalculator (com.facebook.presto.cost)
getStats:80, CachingStatsProvider (com.facebook.presto.cost)
getStats:372, CostCalculatorUsingExchanges$CostEstimator (com.facebook.presto.cost)
visitProject:162, CostCalculatorUsingExchanges$CostEstimator (com.facebook.presto.cost)
visitProject:85, CostCalculatorUsingExchanges$CostEstimator (com.facebook.presto.cost)
accept:108, ProjectNode (com.facebook.presto.spi.plan)
calculateCost:82, CostCalculatorUsingExchanges (com.facebook.presto.cost)
calculateCost:65, CostCalculatorWithEstimatedExchanges (com.facebook.presto.cost)
calculateCost:106, CachingCostProvider (com.facebook.presto.cost)
getGroupCost:98, CachingCostProvider (com.facebook.presto.cost)
getCost:67, CachingCostProvider (com.facebook.presto.cost)
apply:-1, 2131136600 (com.facebook.presto.cost.CostCalculatorUsingExchanges$CostEstimator$$Lambda$2871)
accept:193, ReferencePipeline$3$1 (java.util.stream)
tryAdvance:4719, Collections$2 (java.util)
forEachRemaining:4727, Collections$2 (java.util)
copyInto:482, AbstractPipeline (java.util.stream)
wrapAndCopyInto:472, AbstractPipeline (java.util.stream)
evaluateSequential:708, ReduceOps$ReduceOp (java.util.stream)
evaluate:234, AbstractPipeline (java.util.stream)
reduce:541, ReferencePipeline (java.util.stream)
costForAccumulation:330, CostCalculatorUsingExchanges$CostEstimator (com.facebook.presto.cost)
visitEnforceSingleRow:282, CostCalculatorUsingExchanges$CostEstimator (com.facebook.presto.cost)
visitEnforceSingleRow:85, CostCalculatorUsingExchanges$CostEstimator (com.facebook.presto.cost)
accept:79, EnforceSingleRowNode (com.facebook.presto.sql.planner.plan)
accept:36, InternalPlanNode (com.facebook.presto.sql.planner.plan)
calculateCost:82, CostCalculatorUsingExchanges (com.facebook.presto.cost)
calculateCost:65, CostCalculatorWithEstimatedExchanges (com.facebook.presto.cost)
calculateCost:106, CachingCostProvider (com.facebook.presto.cost)
getGroupCost:98, CachingCostProvider (com.facebook.presto.cost)
getCost:67, CachingCostProvider (com.facebook.presto.cost)
createJoinEnumerationResult:586, ReorderJoins$JoinEnumerator (com.facebook.presto.sql.planner.iterative.rule)
getJoinSource:427, ReorderJoins$JoinEnumerator (com.facebook.presto.sql.planner.iterative.rule)
createJoin:344, ReorderJoins$JoinEnumerator (com.facebook.presto.sql.planner.iterative.rule)
createJoinAccordingToPartitioning:291, ReorderJoins$JoinEnumerator (com.facebook.presto.sql.planner.iterative.rule)
chooseJoinOrder:235, ReorderJoins$JoinEnumerator (com.facebook.presto.sql.planner.iterative.rule)
access$000:190, ReorderJoins$JoinEnumerator (com.facebook.presto.sql.planner.iterative.rule)
apply:168, ReorderJoins (com.facebook.presto.sql.planner.iterative.rule)
apply:103, ReorderJoins (com.facebook.presto.sql.planner.iterative.rule)
transform:226, IterativeOptimizer (com.facebook.presto.sql.planner.iterative)
exploreNode:187, IterativeOptimizer (com.facebook.presto.sql.planner.iterative)
exploreGroup:149, IterativeOptimizer (com.facebook.presto.sql.planner.iterative)
exploreChildren:270, IterativeOptimizer (com.facebook.presto.sql.planner.iterative)
exploreGroup:151, IterativeOptimizer (com.facebook.presto.sql.planner.iterative)
exploreChildren:270, IterativeOptimizer (com.facebook.presto.sql.planner.iterative)
exploreGroup:151, IterativeOptimizer (com.facebook.presto.sql.planner.iterative)
optimize:136, IterativeOptimizer (com.facebook.presto.sql.planner.iterative)
validateAndOptimizePlan:117, Optimizer (com.facebook.presto.sql)
lambda$getLogicalPlan$1:232, QueryExplainer (com.facebook.presto.sql.analyzer)
get:-1, 382733738 (com.facebook.presto.sql.analyzer.QueryExplainer$$Lambda$2601)
profileNanos:136, RuntimeStats (com.facebook.presto.common)
getLogicalPlan:230, QueryExplainer (com.facebook.presto.sql.analyzer)
getLogicalPlan:197, QueryExplainer (com.facebook.presto.sql.analyzer)
getPlan:136, QueryExplainer (com.facebook.presto.sql.analyzer)
getQueryPlan:140, ExplainRewrite$Visitor (com.facebook.presto.sql.rewrite)
visitExplain:123, ExplainRewrite$Visitor (com.facebook.presto.sql.rewrite)
visitExplain:68, ExplainRewrite$Visitor (com.facebook.presto.sql.rewrite)
accept:80, Explain (com.facebook.presto.sql.tree)
process:27, AstVisitor (com.facebook.presto.sql.tree)
rewrite:65, ExplainRewrite (com.facebook.presto.sql.rewrite)
rewrite:58, StatementRewrite (com.facebook.presto.sql.rewrite)
analyzeSemantic:112, Analyzer (com.facebook.presto.sql.analyzer)
analyze:93, BuiltInQueryAnalyzer (com.facebook.presto.sql.analyzer)
<init>:205, SqlQueryExecution (com.facebook.presto.execution)
<init>:109, SqlQueryExecution (com.facebook.presto.execution)
createQueryExecution:961, SqlQueryExecution$SqlQueryExecutionFactory (com.facebook.presto.execution)
lambda$createDispatchQuery$0:167, LocalDispatchQueryFactory (com.facebook.presto.dispatcher)
call:-1, 1091021176 (com.facebook.presto.dispatcher.LocalDispatchQueryFactory$$Lambda$2429)
runInterruptibly:125, TrustedListenableFutureTask$TrustedFutureInterruptibleTask (com.google.common.util.concurrent)
run:57, InterruptibleTask (com.google.common.util.concurrent)
run:78, TrustedListenableFutureTask (com.google.common.util.concurrent)
runWorker:1149, ThreadPoolExecutor (java.util.concurrent)
run:624, ThreadPoolExecutor$Worker (java.util.concurrent)
run:750, Thread (java.lang)
While debugging missing join orders for TPCDS Q24, I observed that Presto has a hard time estimating variable stats for variables computed through a function call. Consider the below query + plan :
The reason we see the
?
for the row count estimate on the join node is because during stats calcuation in ScalarStatsCalculator, Presto fails to estimate the stats for theupper(ca_country)
node.Because of this, queries like TPCDS Q24 which use these functions in join predicates miss out on join orders since the cost returned for these join node ends up as UNKNOWN_COST_RESULT
IMO, we need a way to figure out better estimates for deterministic scalar functions; especially for string functions (TRIM, SUBSTR, LOWER, UPPER) that are likely to get used in datalake scenarios