kaiko-ai / typedspark

Column-wise type annotations for pyspark DataFrames
Apache License 2.0
65 stars 4 forks source link

Introduce DataSetImplements #146

Closed nanne-aben closed 11 months ago

nanne-aben commented 1 year ago

Allows for set-ups such as:

class Age(Schema, Protocol):
    age: Column[LongType]

T = TypeVar("T", bound=Schema)

def birthday(df: DataSetImplements[Age, T]) -> DataSet[T]:
    return transform_to_schema(
        df,
        df.typedspark_schema,
        {Age.age: Age.age + 1},
    )

Where is birthday() defined to:

  1. Take as an input DataSetImplements[Age, T]: a DataSet that implements the protocol Age as T.
  2. Return a DataSet[T]: a DataSet of the same type as the one that was provided.