Closed alec-heif closed 2 years ago
notably, removing the WHERE
at the end (or even changing it to something more trivial like WHERE lt. col_left_1 IS NOT NULL
) does not reproduce the problem)
fyi @sopel39 @skrzypo987 @martint
@raunaqmorarka do you have suspect what could cause it?
@alec-heif please share full explain
Suspect is https://github.com/trinodb/trino/pull/12236. If rule works correctly, then it might be problem with ineq join implementation:
Additionally, optimized execution of inequality join is performed when equi conditions are also present.
which would mean pre-existing issue
EXPLAIN v379:
Fragment 0 [SINGLE]
Output layout: [field, field_0, field_1, field_2]
Output partitioning: SINGLE []
Stage Execution Strategy: UNGROUPED_EXECUTION
Output[col_left_1, col_left_2, col_right_1, col_right_2]
│ Layout: [field:integer, field_0:varchar(3), field_1:integer, field_2:varchar(3)]
│ Estimates: {rows: 1 (111B), cpu: 11.48k, memory: 5.39kB, network: 0B}
│ col_left_1 := field
│ col_left_2 := field_0
│ col_right_1 := field_1
│ col_right_2 := field_2
└─ Filter[filterPredicate = (""field"" >= COALESCE(""field_1"", 0))]
│ Layout: [field:integer, field_0:varchar(3), field_1:integer, field_2:varchar(3)]
│ Estimates: {rows: 1 (111B), cpu: 11.48k, memory: 5.39kB, network: 0B}
└─ LeftJoin[(""field_0"" = ""field_2"")][$hashvalue, $hashvalue_4]
│ Layout: [field:integer, field_0:varchar(3), field_1:integer, field_2:varchar(3)]
│ Estimates: {rows: 2 (153B), cpu: 11.33k, memory: 5.39kB, network: 0B}
│ Distribution: REPLICATED
├─ LocalExchange[ROUND_ROBIN] ()
│ │ Layout: [field:integer, field_0:varchar(3), $hashvalue:bigint]
│ │ Estimates: {rows: 1 (69B), cpu: 138, memory: 0B, network: 0B}
│ └─ Project[]
│ │ Layout: [field:integer, field_0:varchar(3), $hashvalue_3:bigint]
│ │ Estimates: {rows: 1 (69B), cpu: 69, memory: 0B, network: 0B}
│ │ $hashvalue_3 := combine_hash(bigint '0', COALESCE(""$operator$hash_code""(""field_0""), 0))
│ └─ Values
│ Layout: [field:integer, field_0:varchar(3)]
│ Estimates: {rows: 1 (60B), cpu: 0, memory: 0B, network: 0B}
│ (1, 'foo')
└─ LocalExchange[HASH][$hashvalue_4] (""field_2"")
│ Layout: [field_1:integer, field_2:varchar(3), $hashvalue_4:bigint]
│ Estimates: {rows: 1 (69B), cpu: 276, memory: 0B, network: 0B}
└─ Filter[filterPredicate = (""field_2"" = 'foo')]
│ Layout: [field_1:integer, field_2:varchar(3), $hashvalue_5:bigint]
│ Estimates: {rows: 1 (69B), cpu: 207, memory: 0B, network: 0B}
└─ LocalExchange[ROUND_ROBIN] ()
│ Layout: [field_1:integer, field_2:varchar(3), $hashvalue_5:bigint]
│ Estimates: {rows: 1 (69B), cpu: 138, memory: 0B, network: 0B}
└─ Project[]
│ Layout: [field_1:integer, field_2:varchar(3), $hashvalue_6:bigint]
│ Estimates: {rows: 1 (69B), cpu: 69, memory: 0B, network: 0B}
│ $hashvalue_6 := combine_hash(bigint '0', COALESCE(""$operator$hash_code""(""field_2""), 0))
└─ Values
Layout: [field_1:integer, field_2:varchar(3)]
Estimates: {rows: 1 (60B), cpu: 0, memory: 0B, network: 0B}
(2, 'bar')
EXPLAIN v380:
Fragment 0 [SINGLE]
Output layout: [field, field_0, field_1, field_2]
Output partitioning: SINGLE []
Output[col_left_1, col_left_2, col_right_1, col_right_2]
│ Layout: [field:integer, field_0:varchar(3), field_1:integer, field_2:varchar(3)]
│ Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}
│ col_left_1 := field
│ col_left_2 := field_0
│ col_right_1 := field_1
│ col_right_2 := field_2
└─ CrossJoin
│ Layout: [field:integer, field_0:varchar(3), field_1:integer, field_2:varchar(3)]
│ Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}
│ Distribution: REPLICATED
├─ LocalExchange[ROUND_ROBIN] ()
│ │ Layout: [field:integer, field_0:varchar(3)]
│ │ Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}
│ └─ Values
│ Layout: [field:integer, field_0:varchar(3)]
│ Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}
└─ Values
Layout: [field_1:integer, field_2:varchar(3)]
Estimates: {rows: 0 (0B), cpu: 0, memory: 0B, network: 0B}
The bug is in the rule introduced in #12236.
It's incorrect to push the COALESCE(r.col_right_1, 0)
expression below the join, since r.col_right_1
might become null as a result of LEFT JOIN semantics.
thanks for the super speedy fix! i just tried out 389 and it seems to work, so think we're all good here.
We're seeing a regression beginning in version 380 with correctness of complex join predicates.
Here is a minimal repro. The following query should (and does on 379) return the row from the left side, but in 380+ it returns no results.
379 and before:
380 and after: