sdf-labs / sdf-cli

This is the main repository for SDF documentation found at docs.sdf.com, as well as public schemas, benchmarks, and examples
64 stars 6 forks source link

error when selecting from a join with ambigious column names #9

Closed dataders closed 3 months ago

dataders commented 3 months ago

Describe the bug**

SDF Slack thread

related: https://github.com/apache/datafusion/issues/11993

scenario

sdf run (compile has no issue) joining two tables on a key but not specifying columns so that the join column is ambiguous e.g. name in the below query is ambiguous SELECT * FROM age LEFT JOIN job ON age.name = job.name

error

SDF1015: Failed to write table to disk: Optimizer rule 'optimize_projections' failed

To Reproduce**

call sdf run with the below models

-- age.sql
SELECT 'Alice' AS name, 25 AS age
UNION ALL
SELECT 'Bob', 30
UNION ALL
SELECT 'Charlie', 28

-- job.sql
SELECT 'Alice' AS name,
    'Engineer' AS job
UNION ALL
SELECT 'Bob','Designer'
UNION ALL
SELECT 'Charlie','Manager'

-- join.sql
SELECT * FROM age LEFT JOIN job ON age.name = job.name

Expected behavior

flag that there's ambiguous columns that need to be resolved

Additional Context

Version

sdf 0.3.21

full stack trace

SDF1015: Failed to write table to disk: Optimizer rule 'optimize_projections' failed
caused by
optimize_projections
caused by
Internal error: Failed due to a difference in schemas, original schema: DFSchema { inner: Schema { fields: [Field { name: "name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "age", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "job", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [Some(Bare { table: "_0" }), None, Some(Bare { table: "_2" }), None], functional_dependencies: FunctionalDependencies { deps: [] } }, new schema: DFSchema { inner: Schema { fields: [Field { name: "_0.name", data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "age", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "_2.name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "job", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, field_qualifiers: [None, None, None, None], functional_dependencies: FunctionalDependencies { deps: [] } }.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker