salve-org / collegamento

Collegamento provides an easy to use Client/Server IPC backend
https://collegamento.readthedocs.io/en/master/
MIT License
2 stars 0 forks source link

Code Idea: Move away from TypedDict #12

Open Moosems opened 2 months ago

Moosems commented 2 months ago

Code Idea:

The idea behind using the TypedDict's was to give some structure and show what can always be expected in the Request's and Response's. Unfortunately there will almost always be extra values added that the type checkers dislike which requires a # type: ignore. Honestly, this is not a great solution. There has to be a better way. The problem is that its impossible to give good type hints without TypedDict and extra values won't be added until 3.12 if I remember correctly.

Moosems commented 1 month ago

@leycec, master of type hints, do you have any input on this?

leycec commented 1 month ago

@leycec suddenly erupts from the floor! I have been summoned.

Seriously, though. Thanks for summoning me. Let's see here... dataclass IPC shenanigans, huh? TypedDict is some fairly ugly stuff, frankly. In 2024, I'd say the only truly valid use of TypedDict is to precisely type variadic keyword arguments under PEP 692: e.g.,

from typing import TypedDict, Unpack

class _MuhSpecialKwargs(TypedDict):
    timeless_kwarg: int
    priceless_kwarg: str

def i_heart_kwargs(**kwargs: Unpack[_MuhSpecialKwargs]): ...

Will anybody ever actually do that, though? Precisely typing variadic keyword arguments is something nobody cares that much about – especially when doing so requires an unhealthy amount of TypedDict boilerplate just to make the accursed thing work.

Which leaves our world with few to no valid use cases for TypedDict. That said...

I Get It: IPC, Huh?

I suspect I know why you're using TypedDict all over the place here. IPC, right? Specifically, pickle. The standard pickle module suuuuuuuucks. We can all admit this quietly to ourselves while clutching our manga collections. pickle doesn't actually know how to pickle anything anybody cares about. What a pox on the face of Python that module is. Surprisingly, pickle does know how to pickle dictionaries.

So, you started pickling dictionaries back and forth. But plain dictionaries suck. So, you promoted the core Response and Request dictionaries to TypedDict subclasses. This... totally makes sense! I'd have done the same.

But now you're feeling the growing pains. TypedDict has outlived its usefulness, as (of course) TypedDict always does. "What next!?", huh? The perennial question. You have a few options of varying suckiness here:

My favourite usage of @dataclasses.dataclass is the infamous hyper-fast hyper-compact frozen mode, which can net substantial space and time efficiency savings: e.g.,

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class Message(object):
    """Base class for messages in and out of the server"""

    id: int
    type: str  # Can be "request" or "response"

That said, you seem to "need" the ability to monkey-patch in additional instance variables to dataclass instances:

Unfortunately there will almost always be extra values added...

Seems kinda suss, bro. But who am I to squint at another man's suspicious monkey-patch? We all go there when we need to go there. In that case, just drop the frozen=True, slots=True above and you should be good to go.

In theory, you shouldn't even need to write your own __getstate__() and __setstate__() dunder methods to support custom dataclass pickling. The @dataclasses.dataclass decorator defines custom picklers out-of-the-box, as I recall.

All you'll need to do then is heavily refactor your entire codebase to leverage dataclasses rather than typed dictionaries. What could be simpler!? :rofl:

I acknowledge this is an extreme pain point. Thus, this may never happen. Or perhaps there was a valid reason to prefer typed dictionaries over dataclasses during your initial design process? No idea. I know little about many things. My ignorance is as vast as Wikipedia and just as deep.

Moosems commented 1 month ago

@leycec He's alive!!! its alive!

Seems kinda suss, bro. But who am I to squint at another man's suspicious monkey-patch?

So I think in order to understand the way collegamento is set up its important to understand its history. Originally, this was a sub project in Salve that used subprocess's Popen and a JSON RPC style communication method. All items were put in a dict and then JSON dumped and loaded because that was the best way to preserve the objects at the time (I didn't realize pickle could do so much). Eventually I decided to swap to multiprocessing.Queue which no longer needed the JSON dumps and loads but the dict based system stayed. I mainly kept it because it is actually quite a nice way to group together all necessary data in easy to access ways. During the subprocess period I wanted some way to tell the type checker that it could always expect certain things and I would just tack on all the other info and a simple # type: ignore when retrieving that data. Unfortunately I now want to try and avoid these hacky type: ignore's.

@dataclasses.dataclass. The one! The only!

hyper-fast hyper-compact frozen mode

In that case, just drop the frozen=True, slots=True above

The reason I never used Dataclasses originally was because it leads to even more complications and without these special optimizations can be somewhat slow. This is what I hacked together to test this idea out though:

from dataclasses import dataclass

@dataclass
class Message:
    """Base class for messages in and out of the server"""

    id: int
    type: str  # Can be "request" or "response"

    def __post_init__(self) -> None:
        self.items = ["id", "type"]

    def update(self, update_items: dict):
        for key, value in update_items.items():
            setattr(self, key, value)
            self.items.append(key)

m = Message(0, "Test")
m.update({"x": 20})
print(m)
# Message(id=0, type='Test')
print(m.items)
# ['id', 'type', 'x']
if "x" in m.items:
    # Still requires type: ignore
    print(m.x) # type: ignore
    # 20

SadlyI don't see many elegant ways to access the extra elements that don't still add these # type: ignore's.

typing.NamedTuple

I'm glad we can both agree this is a poor option 😄. The only redeeming quality seems to be that it allows default values.

Unfortunately it seems that there is truly no good solution for something like this 😕. I will probably stick with TypedDict over dataclass as it is certainly much nicer to use, albeit not perfect. The main problem ultimately comes down to the fact that I want to access attributes that are never formally described anywhere and the type checker will never be able to not be unhappy with me for that. I would move over to a plain dict but then the type annotations would probably end up being something like data: dict[Any, Any] which is absolutely horrid. That would get the type checker to cool down but get anyone who ever wants to contribute raising their pitchforks.

Sigh Such is life. If you have any other input, please share it, I always love hearing your side of things. Hope that you're doing well and taking the time you need to keep yourself happy IRL.

leycec commented 1 month ago

Hah-ah! Everything sucks, huh? TypedDict sucks, @dataclass sucks, NamedTuple sucks. Actually, I agree. This is why @beartype hand-rolls (i.e., manually defines) its own "dataclasses" without leveraging any of those helpers. Manually defining your own "dataclasses" can be a bit cumbersome, but it also:

Here's how I'd manually define your Message dataclass, for example:

class Message:
    """Base class for messages in and out of the server"""

    # Don't declare these attributes to be class attributes.
    # Do, however, tell mypy and pyright to politely shutup.
    if TYPE_CHECKING:
        id: int
        kind: str  # Can be "request" or "response"

    # @beartype type-checks this on instantiation. Good.
    def __init__(self, id: int, kind: str) -> None:
        self.id = id
        self.kind = kind

    # @beartype type-checks these on property reads. Good.
    @property
    def id(self) -> int: return self.id
    @property
    def kind(self) -> str: return self.kind

    # @beartype type-checks these on property writes. Good.
    @id.setter
    def id(self, id: int) -> None: self.id = id
    @kind.setter
    def kind(self, kind: str) -> None: self.kind = kind

    def __eq__(self, other: object) -> bool:
        return (
            isinstance(other, Message) and
            self.id == other.id and
            self.kind == other.kind
        )

     def __hash__(self) -> int:
        return hash((self.id, self.kind))

    def __repr__(self) -> str:
        return (
            f'{Message.__module__}.Message('
            f'id={repr(self.id)}, kind={repr(self.kind)}'
            f')'
        )

    def __getstate__(self) -> dict:
        return self.__dict__

    def __setstate__(self, d: dict[str, object]) -> None:
        return self.__dict__ = d

That sorta thing, right? Things suddenly explode with boilerplate, which is bad. But you actually control the horizontal and the vertical, which is far more important. Object attribute syntax (e.g., muh_message.id) is far faster and safer than dictionary lookup syntax (e.g., muh_message['id']) with the added benefit of being type-checked on both reads and writes by @beartype now rather than in 20 years when you no longer care about this because you've been uploaded to the FaceBook- Meta-managed MallWorldCloud™. See you there, @Moosems! I'll be the bald fat avatar buying virtual manga in the MallWorldCloudWeeb store, by the way. :smile: