Open Moosems opened 2 months ago
@leycec, master of type hints, do you have any input on this?
@leycec suddenly erupts from the floor! I have been summoned.
Seriously, though. Thanks for summoning me. Let's see here... dataclass IPC shenanigans, huh? TypedDict
is some fairly ugly stuff, frankly. In 2024, I'd say the only truly valid use of TypedDict
is to precisely type variadic keyword arguments under PEP 692: e.g.,
from typing import TypedDict, Unpack
class _MuhSpecialKwargs(TypedDict):
timeless_kwarg: int
priceless_kwarg: str
def i_heart_kwargs(**kwargs: Unpack[_MuhSpecialKwargs]): ...
Will anybody ever actually do that, though? Precisely typing variadic keyword arguments is something nobody cares that much about – especially when doing so requires an unhealthy amount of TypedDict
boilerplate just to make the accursed thing work.
Which leaves our world with few to no valid use cases for TypedDict
. That said...
I suspect I know why you're using TypedDict
all over the place here. IPC, right? Specifically, pickle
. The standard pickle
module suuuuuuuucks. We can all admit this quietly to ourselves while clutching our manga collections. pickle
doesn't actually know how to pickle anything anybody cares about. What a pox on the face of Python that module is. Surprisingly, pickle
does know how to pickle dictionaries.
So, you started pickling dictionaries back and forth. But plain dictionaries suck. So, you promoted the core Response
and Request
dictionaries to TypedDict
subclasses. This... totally makes sense! I'd have done the same.
But now you're feeling the growing pains. TypedDict
has outlived its usefulness, as (of course) TypedDict
always does. "What next!?", huh? The perennial question. You have a few options of varying suckiness here:
typing.NamedTuple
. It's better than typing.TypedDict
, but it's also really just a poor man's dataclass. Which brings us to...@dataclasses.dataclass
. The one! The only! The one dataclass to rule them all is, indeed, the only dataclass worth discussing in 2024. @beartype loves @dataclasses.dataclass
so much that @beartype 0.20.0 will probably start type-checking dataclass fields on assignment. Like all prior predictions, this is a lie. In truth, I'll just play video games all winter. I sob. :sob: My favourite usage of @dataclasses.dataclass
is the infamous hyper-fast hyper-compact frozen mode, which can net substantial space and time efficiency savings: e.g.,
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class Message(object):
"""Base class for messages in and out of the server"""
id: int
type: str # Can be "request" or "response"
That said, you seem to "need" the ability to monkey-patch in additional instance variables to dataclass instances:
Unfortunately there will almost always be extra values added...
Seems kinda suss, bro. But who am I to squint at another man's suspicious monkey-patch? We all go there when we need to go there. In that case, just drop the frozen=True, slots=True
above and you should be good to go.
In theory, you shouldn't even need to write your own __getstate__()
and __setstate__()
dunder methods to support custom dataclass pickling. The @dataclasses.dataclass
decorator defines custom picklers out-of-the-box, as I recall.
All you'll need to do then is heavily refactor your entire codebase to leverage dataclasses rather than typed dictionaries. What could be simpler!? :rofl:
I acknowledge this is an extreme pain point. Thus, this may never happen. Or perhaps there was a valid reason to prefer typed dictionaries over dataclasses during your initial design process? No idea. I know little about many things. My ignorance is as vast as Wikipedia and just as deep.
@leycec He's alive!!!
Seems kinda suss, bro. But who am I to squint at another man's suspicious monkey-patch?
So I think in order to understand the way collegamento is set up its important to understand its history. Originally, this was a sub project in Salve that used subprocess
's Popen
and a JSON RPC style communication method. All items were put in a dict and then JSON dumped and loaded because that was the best way to preserve the objects at the time (I didn't realize pickle could do so much). Eventually I decided to swap to multiprocessing.Queue
which no longer needed the JSON dumps and loads but the dict based system stayed. I mainly kept it because it is actually quite a nice way to group together all necessary data in easy to access ways.
During the subprocess
period I wanted some way to tell the type checker that it could always expect certain things and I would just tack on all the other info and a simple # type: ignore
when retrieving that data. Unfortunately I now want to try and avoid these hacky type: ignore
's.
@dataclasses.dataclass. The one! The only!
hyper-fast hyper-compact frozen mode
In that case, just drop the
frozen=True, slots=True
above
The reason I never used Dataclass
es originally was because it leads to even more complications and without these special optimizations can be somewhat slow. This is what I hacked together to test this idea out though:
from dataclasses import dataclass
@dataclass
class Message:
"""Base class for messages in and out of the server"""
id: int
type: str # Can be "request" or "response"
def __post_init__(self) -> None:
self.items = ["id", "type"]
def update(self, update_items: dict):
for key, value in update_items.items():
setattr(self, key, value)
self.items.append(key)
m = Message(0, "Test")
m.update({"x": 20})
print(m)
# Message(id=0, type='Test')
print(m.items)
# ['id', 'type', 'x']
if "x" in m.items:
# Still requires type: ignore
print(m.x) # type: ignore
# 20
SadlyI don't see many elegant ways to access the extra elements that don't still add these # type: ignore
's.
typing.NamedTuple
I'm glad we can both agree this is a poor option 😄. The only redeeming quality seems to be that it allows default values.
Unfortunately it seems that there is truly no good solution for something like this 😕. I will probably stick with TypedDict
over dataclass
as it is certainly much nicer to use, albeit not perfect. The main problem ultimately comes down to the fact that I want to access attributes that are never formally described anywhere and the type checker will never be able to not be unhappy with me for that. I would move over to a plain dict
but then the type annotations would probably end up being something like data: dict[Any, Any]
which is absolutely horrid. That would get the type checker to cool down but get anyone who ever wants to contribute raising their pitchforks.
Sigh Such is life. If you have any other input, please share it, I always love hearing your side of things. Hope that you're doing well and taking the time you need to keep yourself happy IRL.
Hah-ah! Everything sucks, huh? TypedDict
sucks, @dataclass
sucks, NamedTuple
sucks. Actually, I agree. This is why @beartype hand-rolls (i.e., manually defines) its own "dataclasses" without leveraging any of those helpers. Manually defining your own "dataclasses" can be a bit cumbersome, but it also:
mypy
and pyright
.typeguard
.Here's how I'd manually define your Message
dataclass, for example:
class Message:
"""Base class for messages in and out of the server"""
# Don't declare these attributes to be class attributes.
# Do, however, tell mypy and pyright to politely shutup.
if TYPE_CHECKING:
id: int
kind: str # Can be "request" or "response"
# @beartype type-checks this on instantiation. Good.
def __init__(self, id: int, kind: str) -> None:
self.id = id
self.kind = kind
# @beartype type-checks these on property reads. Good.
@property
def id(self) -> int: return self.id
@property
def kind(self) -> str: return self.kind
# @beartype type-checks these on property writes. Good.
@id.setter
def id(self, id: int) -> None: self.id = id
@kind.setter
def kind(self, kind: str) -> None: self.kind = kind
def __eq__(self, other: object) -> bool:
return (
isinstance(other, Message) and
self.id == other.id and
self.kind == other.kind
)
def __hash__(self) -> int:
return hash((self.id, self.kind))
def __repr__(self) -> str:
return (
f'{Message.__module__}.Message('
f'id={repr(self.id)}, kind={repr(self.kind)}'
f')'
)
def __getstate__(self) -> dict:
return self.__dict__
def __setstate__(self, d: dict[str, object]) -> None:
return self.__dict__ = d
That sorta thing, right? Things suddenly explode with boilerplate, which is bad. But you actually control the horizontal and the vertical, which is far more important. Object attribute syntax (e.g., muh_message.id
) is far faster and safer than dictionary lookup syntax (e.g., muh_message['id']
) with the added benefit of being type-checked on both reads and writes by @beartype now rather than in 20 years when you no longer care about this because you've been uploaded to the FaceBook- Meta-managed MallWorldCloud™. See you there, @Moosems! I'll be the bald fat avatar buying virtual manga in the MallWorldCloudWeeb store, by the way. :smile:
Code Idea:
The idea behind using the
TypedDict
's was to give some structure and show what can always be expected in theRequest
's andResponse
's. Unfortunately there will almost always be extra values added that the type checkers dislike which requires a# type: ignore
. Honestly, this is not a great solution. There has to be a better way. The problem is that its impossible to give good type hints withoutTypedDict
and extra values won't be added until 3.12 if I remember correctly.