Closed pthatte1-bb closed 3 months ago
DuckDB may support this but it doesn't have support in their Substrait implementation:
install substrait;
load substrait;
install tpch;
load tpch;
call dbgen(sf=1);
.width -1
.mode csv
FROM get_substrait_json('
WITH raw_data as (SELECT * from customer),
cte1 as (SELECT c_custkey as join_custkey, c_name from raw_data where c_custkey = 131074),
cte2 as (SELECT c_custkey as other_custkey, c_name from raw_data where c_custkey = 131075)
select * from cte1 full join cte2 on join_custkey = other_custkey;
');
Which results in the following:
INTERNAL Error: Unsupported join type FULL
This error signals an assertion failure within DuckDB. This usually occurs due to unexpected conditions or errors in the program's logic.
For more information, see https://duckdb.org/docs/dev/internal_errors
I'll file an issue.
Thank you. I did some digging and in case it helps,
INNER
, LEFT
and RIGHT
are explicitly listed) -
https://github.com/duckdb/substrait/blob/main/src/from_substrait.cpp#L388OUTER
has an enum-name
of FULL
) - https://github.com/duckdb/duckdb/blob/50bb607e2b2e5728664dd18da330eda354be3b96/src/common/enum_util.cpp#L3425I am able to achieve an OUTER join using a combination of LEFT_join->RIGHT_join->UNION->DISTINCT. So hopefully the missing feature is achievable, and not an intentional exclusion.
DuckDB merged support for full outer joins in Substrait today. It will be in the next release.
I see the commit, thank you. Trying to build/test locally, but happy to close the issue sooner if you like.
I have a test I'll add for this once that version becomes available. Until then I'm okay with keeping this open. I might look into labels so we can mark ones we're waiting on for easier tracking.
I tried with a locally-built binary. Snippet used for testing -
WITH raw_data as (SELECT * from customer),
cte1 as (SELECT c_custkey as join_custkey, c_name from raw_data where c_custkey = 131074),
cte2 as (SELECT c_custkey as other_custkey, c_name from raw_data where c_custkey = 131075)
select * from cte1 full join cte2 on join_custkey = other_custkey;
Substrait-production works fine. Consuming the produced-substrait fails with a validation error (dup-col-name"c_name"
).
AFAIK this might have slipped through because the nested expressions
PR is missing substrait consumption
tests:
https://github.com/duckdb/substrait/pull/104/files#diff-83cddda20fbd3b324186aab07ed1d31b844236d9ae79727c60444002a3c8c7dfR48
Closing this issue because full-join
does work with a query that passes validation.
Snippet to reproduce success -
WITH raw_data as (SELECT * from customer),
cte1 as (SELECT c_custkey as join_custkey, c_name as c_name1 from raw_data where c_custkey = 131074),
cte2 as (SELECT c_custkey as other_custkey, c_name as c_name2 from raw_data where c_custkey = 131075)
select * from cte1 full join cte2 on join_custkey = other_custkey;
Queries with JoinType "full_outer" fail with error
"Unsupported join type"
.Snippet to recreate error -
Changing the JoinType to "left" or "right" works.
Also, DuckDB-SQL version of above query works ok-