kaiko-ai / typedspark

Column-wise type annotations for pyspark DataFrames
Apache License 2.0
65 stars 4 forks source link

Support self-joins in typedspark #211

Closed nanne-aben closed 12 months ago

nanne-aben commented 1 year ago

Allows for self-joins, such as:

from typedspark import register_schema_to_dataset_with_alias

df = create_partially_filled_dataset(
    spark,
    Person,
    {
        Person.id: [1, 2, 3],
        Person.name: ["Alice", "Bob", "Charlie"],
        Person.age: [20, 30, 40],
    },
)

df_a, person_a = register_schema_to_dataset_with_alias(df, Person, alias="a")
df_b, person_b = register_schema_to_dataset_with_alias(df, Person, alias="b")

df_a.join(df_b, person_a.id == person_b.id)