kaiko-ai / typedspark

Column-wise type annotations for pyspark DataFrames
Apache License 2.0
65 stars 4 forks source link

Deal with ambiguous columns in transform_to_schema() #172

Closed nanne-aben closed 1 year ago

nanne-aben commented 1 year ago

This PR allows us to do:

(
    transform_to_schema(
        df_a.join(
            df_b,
            person.id == job.id,
        ),
        PersonWithJob,
        {
            PersonWithJob.id: person.id,
        },
    ).show()
)

Previously, this would have resulted in an ambiguous column exception, and you'd have to run .drop(job.id) to resolve it. This resolves it in a more intuitive way.