Open boblannon opened 2 days ago
for my specific problems with spark_type
, I find that I can satisfy mypy with a really thin shim:
edit: Turns out pydantic.Field
is a function with a lot of complex overloading. had to change my shim to reflect this
from typing import Optional, Type
from pyspark.sql.types import DataType
def SparkField(*args, spark_type: Optional[Type[DataType]] = None, **kwargs) -> Any:
if spark_type is not None:
kwargs["spark_type"] = spark_type
return Field(*args, **kwargs)
Hi @boblannon
Thanks for the suggestion. This makes sense to me and does feel a bit clearer than using Field
directly.
I'm happy to implement this suggestion but if you'd like to contribute it directly send me the PR and I'll happily cut a new release.
Thanks again for the suggestion!
Love sparkdantic, and continue to rely on it. Thank you for your work!
One minor annoyance is that mypy always yells at me for using the
spark_type
kwarg to be explicit about the native spark type that should be used. For example:This raises two warnings, one for each property:
This is because, of course,
pydantic.Field
isn't expecting this kwarg. Currently it seems like sparkdantic relies on pydantic's behavior, which throws any unexpectedkwarg
intoFieldInfo.json_extra_schema
. https://github.com/mitchelllisle/sparkdantic/blob/083c4f32212da4de76725b5f6d321a6dcfa4d113/src/sparkdantic/model.py#L150I'm not necessarily arguing against this choice, but I wonder if we could find a way that makes the type checker happy as well. Perhaps a subclass of
pydantic.Field
likepydantic.SparkField
that handles sparkdantic-specific kwargs like this?I'm happy to lend a hand in implementation, too.