kaiko-ai / typedspark

Column-wise type annotations for pyspark DataFrames
Apache License 2.0
65 stars 4 forks source link

Typedspark does not work with Python 3.11.9 #353

Closed nanne-aben closed 6 months ago

nanne-aben commented 6 months ago

The unit tests currently don't pass for Python 3.11.9. As a temporary fix, the ci/cd is constrained to use 3.11.8 for now.

Interestingly, the other supported versions (3.9, 3.10, 3.12) work without problems.

I'll debug the problem later. Currently, I can't install Python 3.11.9 with pyenv.

nanne-aben commented 6 months ago

This version of Python catches all exceptions that are raised as part of the DataSet.__setattr__() call that is used when the schema annotation is assigned to the class.

For example, the following doesn't raise an exception in Python 3.11.9 (but it does in other versions of Python).

from typing import Any, Generic, TypeVar, get_args

T = TypeVar("T")

class DataSet(Generic[T]):
    def __setattr__(self, name: str, value: Any) -> None:
        object.__setattr__(self, name, value)

        if name == "__orig_class__":
            orig_class_args = get_args(self.__orig_class__)  # type: ignore
            if orig_class_args:
                schema_annotations = orig_class_args[0]
                print(schema_annotations)
                raise Exception("Just randomly raising an exception for illustrative purposes.")

class Schema:
    pass

df = DataSet[Schema]()

Output in 3.11.9:

<class '__main__.Schema'>

Output in 3.11.2:

image
nanne-aben commented 6 months ago

I've opened an issue with python about this here.