Closed maciejgrzybek closed 7 years ago
I believe we could rewrite the plan to include CAST
nodes whenever we decide to coerce. This would be consistent with the visitors we have in the code.
I was not convinced by Analyzer
rewriting the plan but it turns out we do it anyway, e.g. StatementAnalyzer.visitJoin
does it as an optimization:
0 = 0
or 0 = 1
.@fiedukow I think you stumbled at this issue in the past.
@maciejgrzybek We had a discussion with @martint about whether rewrites in analyzer are fine, see: https://github.com/prestodb/presto/issues/6883. Analyzer should provide proper feedback to the user. If you rewrite the nodes during analysis, then for instance error messages might not be valid anymore (e.g: point to non-exiting AST nodes). Additionally, Analysis
is based on AST nodes identity and rewritten expressions won't map in Analysis
. For complex ORDER BY expressions rewrites in analyzer didn't work (because of coercions and invalid error messages referencing to wrong nodes). This is tricky subject at least. @martint might provide more input.
Maybe optimization code should be removed from statementAnalyzer#visitJoin
(there is the only call of this signature of com.facebook.presto.sql.planner.ExpressionInterpreter#expressionOptimizer
) and moved to a separate optimizer after initial planning? The coercions should be present by then.
The argument for user-feedback is not quite correct with existing optimization (rewrite expressions into 0 = 0 or 0 = 1).
I will move the optimization to later phase (planner or optimizer) and post a PR to discuss more around the code, if needed.
The argument for user-feedback is not quite correct with existing optimization (rewrite expressions into 0 = 0 or 0 = 1).
Yeah, that rewrite probably shouldn't be in StatementAnalyzer
too
The analyzer's job is to validate the query and annotate the AST with any information that the planner is going to need to go from AST->IR (AST is untyped. IR should, ideally, be fully typed, with all coercions resolved). We still do some rewrites in the analyzer due to legacy reasons, but those should move to proper optimizers.
returns
true
despite the factreturns
false
.This happens because:
ExpressionAnalyzer
(to be$operator$EQUAL(varchar(2),varchar(2)):boolean
), registering necessary coercions (cast ('a' as char(1))
->VARCHAR(2)
,cast ('a' as char(2))
->VARCHAR(2)
)expressionOptimizer(canonicalized, metadata, session, analyzer.getExpressionTypes()).optimize(NoOpSymbolResolver.INSTANCE)
fires aforementioned equal operator to compare two char fields (stored asSlice
Java type).Slices
contain the exact same value - one char 'a' only.expressionOptimizer
notes that, thereforebecomes rewritten into:
That is because since
expressionOptimize
works on plan, without taking into consideration already existing coercions.Similar situation would occur for other coercible types which share common Java type e.g. short decimal and real:
returns
true
but:returns
false
. Mechanism is the same: equal operator is picked to be$operator$EQUAL(real,real):boolean
but rewrites further on (inAnalyzer
) look only at the plan, not considering coercions, thereforeReal.equal(long, long)
is called with one long staying forReal
type while another representingShort Decimal
(also represented aslong
Java type underneath).