mitchelllisle / sparkdantic

✨ A Pydantic to PySpark schema library
https://mitchelllisle.github.io/sparkdantic/
MIT License
54 stars 10 forks source link

`TypeError: issubclass() arg 1 must be a class` #408

Closed arcaputo3 closed 2 months ago

arcaputo3 commented 3 months ago

I'm attempting to generate a Spark schema for Anthropic Message types. For certain types, we attempt to check if it is a subclass of Enum, but this fails when the type itself is not a class. I will continue to investigate, but I believe this is due to Message having a content field of type content: List[ContentBlock] where ContentBlock = Annotated[Union[TextBlock, ToolUseBlock], PropertyInfo(discriminator="type")]

Traceback:

from anthropic.types import Message
from sparkdantic.model import create_spark_schema

completion_schema = create_spark_schema(Message)

----> 5 completion_schema = create_spark_schema(Message)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-37bb6f45-f99d-4913-99d2-e94211b03815/lib/python3.10/site-packages/sparkdantic/model.py:385, in _type_to_spark(t, metadata, safe_casting)
    382 elif _is_supported_subclass(t):
    383     return create_spark_schema(t, safe_casting), nullable
--> 385 if issubclass(t, Enum):
    386     t = _get_enum_mixin_type(t)
    388 if t in native_spark_types:

https://github.com/mitchelllisle/sparkdantic/blob/a3f88bd5cda82f66116ea0e82490fbab21014123/src/sparkdantic/model.py#L385

mitchelllisle commented 2 months ago

Hi @arcaputo3

Thanks for submiting this. I've had a look and its a little tricky to cater to this specific use case. You are correct that the main issue is the Union[TextBlock, ToolUseBlock] definition. Spark doesn't have the concept of Union types so its hard to translate this is any general way. In some parts of the code we look for the first element defined in the union but this doesn't feel like an appropriate solution for this.

The override method would work but unfortunately the way I've implemented it makes it a little difficult to use so I'll need to spend some time working on a better solution for this. I am thinking if you could do something like this:

class A(SparkModel):
       a: str

class B(SparkModel):
       b: str

class UnionOverride(SparkModel):
        mapping: Union[A, B] = Field(spark_type=MapType(StringType(), StringType()))

Then you can atleast have an 'escape hatch' that lets you represent more generic types as a MapType. What are your thoughts?

arcaputo3 commented 2 months ago

Hi @mitchelllisle this should work for now. Thank you for the reply!