romka-best / gpts-turbo-bot

6 stars 1 forks source link

[TECH] Models serialization using pydantic #1

Open xDiaym opened 7 months ago

xDiaym commented 7 months ago

Serializing models to json provided in a commit #baeb343 can be done using pydantic(see serialization chapter), which is already included in the dependencies here To avoid breaking backward compatibility, we can use something like

from typing import Any
from pydantic import BaseModel

class UsedFaceSwapPackage(BaseModel):
    # omitted

    def to_dict(self) -> dict[str, Any]:
        return self.model_dump_json()

Pydantic can also provide more rigorous and easier-to-use model validation. For example, the API may change over time and some fields may need to be omitted.

I can implement this, but can't test it in my environment, so I opened issue instead of PR. :disappointed:

romka-best commented 7 months ago

Hi @xDiaym ,

Thank you for opening this issue and for the valuable suggestion. Using Pydantic for serializing models to JSON sounds like a promising approach, but сould you please clarify how this approach would be more beneficial compared to our current method of using return vars(self) for the to_dict method? We're interested in understanding the specific advantages this change would bring to our project.

xDiaym commented 7 months ago

Sure. Here are the advantages and disadvantages of pydantic

Pros.

Pydantic can reduce amount of boilerplate code

Vanilla Python ```python class Role: id: str name: str translated_names: Dict translated_descriptions: Dict translated_instructions: Dict created_at: datetime edited_at: datetime def __init__(self, id: str, name: str, translated_names: Dict, translated_descriptions: Dict, translated_instructions: Dict, created_at=None, edited_at=None): self.id = id self.name = name self.translated_names = translated_names self.translated_descriptions = translated_descriptions self.translated_instructions = translated_instructions current_time = datetime.now(timezone.utc) self.created_at = created_at if created_at is not None else current_time self.edited_at = edited_at if edited_at is not None else current_time def to_dict(self): return vars(self) ```
Pydantic ```python from pydantic import BaseModel, Field class Role(BaseModel): id: str name: str translated_names: Dict translated_descriptions: Dict translated_instructions: Dict created_at: datetime = Filed(default_factory=lambda: datetime.now(timezone.now)) edited_at: datetime = Filed(default_factory=lambda: datetime.now(timezone.now)) def to_dict(self) -> Dict: return self.model_dump() ```

Pydantic can "hide" some fields and exclude them from serialization

May be useful in more code support (eg GPT5 release) Unfortunately this can be understood as a violation of YAGNI :(

class Role(BaseModel):
      def to_dict(self) -> Dict:
          return self.model_dump(exclude="user_id")

Cons

Speed

On my machine, the code serialized via pydantic was approx.15 times slower, which is sad, since pydanic was rewritten in rust :( But we can use var(self) and pydantic, which will give comparable speed and avoid writing __init__ methods (and makes this issue meaningless)

Code ```python from pydantic import BaseModel, Field class Role(BaseModel): id: str name: str translated_names: Dict translated_descriptions: Dict translated_instructions: Dict created_at: datetime = Filed(default_factory=lambda: datetime.now(timezone.now)) edited_at: datetime = Filed(default_factory=lambda: datetime.now(timezone.now)) def to_dict(self) -> Dict: return var(self) ```

Conslusion

On the one hand, this may look like overengineering and a violation of the golden rule of design: “the more nodes in the system, the less reliable it is,” but on the other hand, I hope this approach can make it easier to maintain and read. Sorry, if I waste your time :(

romka-best commented 7 months ago

Hey there @xDiaym !

Actually, I agree with you, let's try it. What do you think if you did this and I'll test it in my environment?

I'm looking forward to collaborate together!