kaiko-ai / typedspark

Column-wise type annotations for pyspark DataFrames
Apache License 2.0
65 stars 4 forks source link

Subclass column meta #468

Closed nanne-aben closed 2 months ago

nanne-aben commented 2 months ago

Allow people to define their own column metadata fields, for example:

from dataclasses import dataclass
from typing import Annotated
from pyspark.sql.types import LongType, StringType
from typedspark import ColumnMeta, Schema
from typedspark._core.column import Column

@dataclass
class MyColumnMeta(ColumnMeta):
    primary_key: bool = False

class Persons(Schema):
    id: Annotated[
        Column[LongType],
        MyColumnMeta(
            comment="Identifies the person",
            primary_key=True,
        ),
    ]
    name: Column[StringType]
    age: Column[LongType]

Persons.get_metadata()

Which would return:

{'id': {'comment': 'Identifies the person', 'primary_key': True},
 'name': {},
 'age': {}}