oxan / djangorestframework-dataclasses

Dataclasses serializer for Django REST framework
BSD 3-Clause "New" or "Revised" License
429 stars 28 forks source link

For nested dataclass serializers, prefix with the name of dataclass so drf-spectacular doesn't combine components #55

Closed ford-carta closed 2 years ago

ford-carta commented 2 years ago

Problem

The generated nested dataclass serializers are called DataclassSerializer, making it hard to differentiate them. When used with drf-spectacular this leads to unfortunate collisions that override the different types.

OrderedDict([('messages',
              MessageSerializer(many=True):
                  contacts = ContactSerializer(many=True):
                      name = CharField()
                  id = CharField()
                  message_timestamp = DateTimeField()
                  attachments = DataclassSerializer(many=True):
                      id = UUIDField()
                      filename = CharField()
                      url = CharField()),
             ('firm',
              DataclassSerializer(dataclass=<class 'fund_admin.comms.domain.FirmDomain'>):
                  id = IntegerField()
                  firm_name = CharField()
             ('entity',
              DataclassSerializer(dataclass=<class 'fund_admin.comms.domain.EntityDomain'>):
                  id = IntegerField()
                  entity_name = CharField()

This is what get_fields returns for my serializer. As you can see, the generated DataclassSerializers have the name DataclassSerializer. When this is put through drf-spectacular, this causes the Components to all be called "Dataclass" (Serializer postfix is stripped) and the last one to be evaluated (most nested one in this case) overrides the type in the schema.

Proposal: Prefix serializer class name with dataclass name

On line 512 in rest_framework_dataclasses/serializers.py

def build_dataclass_field(self, field_name: str, type_info: TypeInfo) -> SerializerFieldDefinition:
...
    # add this line at 512
    class NewClass(field_class):
        pass
    NewClass.__name__ = type_info.base_type.__name__ +  "Serializer"
    return NewClass, field_kwargs

Generated schema showing that attachments override FirmDomain and EntityDomain

    Dataclass:
      type: object
      properties:
        id:
          type: string
          format: uuid
        filename:
          type: string
        url:
          type: string
      required:
      - filename
      - id
      - url

cc: @tfranzel @oxan

intgr commented 2 years ago

You have pointed out an important problem, but IMO the proposed solution is not a good one.

NewClass.__name__ = type_info.base_type.__name__ + "Serializer"

This would work in most cases, but seems like a massive hack. Python's classes are namespaced by modules, there is explicitly no global namespace of functions or classes. You can easily have FooSerializer in multiple modules of your, or maybe from multiple separate 3rd party packages that you installed, and you would have no idea.

The correct fix in my opinion is in drf-spectacular -- it should not assume that class names are a globally unique namespace. If it needs globally unique names, it should take responsibility guaranteeing that. For example, if two serializers have the same name, drf-spectacular might compare their fields and append unique suffixes.

As for generating meaningful human-readable names for documentation (the NewClass.__name__ example), instead of using class __name__ that can only be customized with hacks, I think there should be a serializer method for it that drf-spectacular can call.

PS: I have to admit I am actually using a similar hack to what you proposed in my own project. But I don't consider it a good long-term solution.

tfranzel commented 2 years ago

I got 2 points on this.

@ford-carta how is that dataclass written down? That field dump obfuscates the situation for me. Maybe we need to be smarter here.

IMO the proposed solution is not a good one

A solution worth considering if most users will not encounter the collision issue anymore as a result.

As for generating meaningful human-readable names for documentation

Support/feedback from maintainers is always welcome. Here, we need to either extract more info as is (if possible) or are provided with a bit of extra info (that may have to be added here).

The namespacing is a different issue. I acknowledge this is a shortcoming, but no good solution came to mind at that time. Just slapping on the import path will not lead to nice names for most users. Fyi: Most people use the ref_name feature (ported from yasg) to fix singular collisions. It's not the only solution but relatively clean. Suggestions welcome here too!


    class YSerializer(serializers.Serializer):
        x = serializers.IntegerField()

        class Meta:
            ref_name = 'NewName' # <-------
ford-carta commented 2 years ago

@tfranzel For reference here are the data classes.

@dataclass
class FirmDomain:
    id: int
    firm_name: str

@dataclass
class EntityDomain:
    id: int
    entity_name: str

@dataclass
class MessageContactDomain:
    contact_id: int
    """Pk of the contact model"""
    type: ContactType
    name: str
    external_id: str
    """Email"""

@dataclass
class AttachmentInMessage:
    id: UUID
    filename: str
    url: str

@dataclass
class MessageInThread:
    id: str
    """Message pk"""
    message_timestamp: datetime
    content_text: str
    content_html: str
    contacts: List[MessageContactDomain]
    attachments: List[AttachmentInMessage]

@dataclass
class ThreadDomain:
    """An email thread"""

    id: int
    """Thread pk"""
    content: str
    firm: FirmDomain
    entity: EntityDomain
    messages: List[MessageInThread]

Fyi: Most people use the ref_name feature

Cool! I did not know about the ref_name solution. That solves the issue when the collision is at the parent serializer level.

Perhaps we could borrow that idea and add a Meta class or field to the child dataclass. For example

@dataclass
class FirmDomain:
    id: int
    name: str

    @classmethod
    def get_schema_name(self): # might be better to use a property here
        return "ThreadFirm"

and then back on line 512 in rest_framework_dataclasses/serializers.py

        dataclass = type_info.base_type
        class NewClass(field_class):
            pass

        override_schema_name = (
            dataclass.get_schema_name()
            if hasattr(dataclass, "get_schema_name")
            and callable(getattr(dataclass, "get_schema_name"))
            else None
        )
        dataclass_schema_name = override_schema_name or dataclass.__name__
        NewClass.__name__ = dataclass_schema_name + "Serializer"
        NewClass.__doc__ = dataclass.__doc__ + f" path: {dataclass.__module__}" # Also we dont want the DataclassSerializer Docstring here either
        return NewClass, field_kwargs
oxan commented 2 years ago

Which drf-spectacular version are you using? The drf-dataclasses extension added in 0.21.2 should fix this.

ford-carta commented 2 years ago

🤦 This looks like what I was looking for, thanks!

tfranzel commented 2 years ago

I blindly assumed we had some special case missing somewhere. thx @oxan for pointing it out before I invested time here.