sdf run (compile has no issue)
joining two tables on a key but not specifying columns so that the join column is ambiguous e.g. name in the below query is ambiguous
SELECT * FROM age LEFT JOIN job ON age.name = job.name
error
SDF1015: Failed to write table to disk: Optimizer rule 'optimize_projections' failed
To Reproduce**
call sdf run with the below models
-- age.sql
SELECT 'Alice' AS name, 25 AS age
UNION ALL
SELECT 'Bob', 30
UNION ALL
SELECT 'Charlie', 28
-- job.sql
SELECT 'Alice' AS name,
'Engineer' AS job
UNION ALL
SELECT 'Bob','Designer'
UNION ALL
SELECT 'Charlie','Manager'
-- join.sql
SELECT * FROM age LEFT JOIN job ON age.name = job.name
Expected behavior
flag that there's ambiguous columns that need to be resolved
Additional Context
Version
sdf 0.3.21
full stack trace
SDF1015: Failed to write table to disk: Optimizer rule 'optimize_projections' failed
caused by
optimize_projections
caused by
Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "age", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "job", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "_0" }), None, Some(Bare { table: "_2" }), None], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "_0.name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "age", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "_2.name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "job", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, None, None, None], functional_dependencies: FunctionalDependencies { deps: [] } }.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Describe the bug**
SDF Slack thread
related: https://github.com/apache/datafusion/issues/11993
scenario
sdf run
(compile
has no issue) joining two tables on a key but not specifying columns so that the join column is ambiguous e.g. name in the below query is ambiguousSELECT * FROM age LEFT JOIN job ON age.name = job.name
error
SDF1015: Failed to write table to disk: Optimizer rule 'optimize_projections' failed
To Reproduce**
call
sdf run
with the below modelsExpected behavior
flag that there's ambiguous columns that need to be resolved
Additional Context
Version
sdf 0.3.21
full stack trace