Open NeilGirdhar opened 1 year ago
Thanks for opening this issue. I'm supportive of this idea -- I agree it would be quite helpful for typing.
To give context for those who don't follow stuff that goes on at typeshed/typing_extensions: we recently changed how we annotate some dataclasses
-module functions over at typeshed. We now define a DataclassInstance
protocol, which we use for our annotations for dataclasses.asdict
, dataclasses.astuple
, and dataclasses.replace
. These are much more precise than our previous annotations, but the feedback we've received from users is that it's quite painful for users of type checkers not to be able to access our DataclassInstance
protocol at runtime -- it's currently "private to typeshed", so can't easily be used in user code.
I think dataclasses.Dataclass
would be a bad name for this class. dataclasses works through decorators, and I don't think anybody wants to change that, but the name dataclasses.Dataclass
implies (to me) a concrete class that can be inherited from, similar to pydantic.BaseModel
. To reflect that this is an abstract class, that users shouldn't even inherit from (since dataclasses works via decorators, not inheritance), maybe dataclasses.DataclassLike
would be a better name? This would be a similar name to os.PathLike
, which has a similar role. It would also make it slightly clearer that objects created by "dataclass-esque" third-party libraries (perhaps ones that use the dataclass_transform
mechanism) could also be considered "instances of" this abstract base class if they have the __dataclass_fields__
attributes.
Cc. @ericvsmith and @carljm for dataclasses.
the name
dataclasses.Dataclass
implies a concrete class that can be inherited from, similar topydantic.BaseModel
.
I see your point about DataclassLike
, but just to provide some reasoning for the simpler name: it's reflective of all the other ABCs like those in numbers
and collections.abc
. We don't have RealLike
, but just Real
; we don't have SequenceLike
, but just Sequence
.
And the function that currently checks dataclasses is called is_dataclass
--not is_dataclass_like
. So, from a natural language standpoint, I thought that since we say that that a class X
is a dataclass, it seemed natural to me to spell that isinstance(X, Dataclass)
.
But DataclassLike
would be fine too.
I see your point about
DataclassLike
, but just to provide some reasoning for the simpler name: it's reflective of all the other ABCs like those innumbers
andcollections.abc
. We don't haveRealLike
, but justReal
; we don't haveSequenceLike
, but justSequence
.
Sure, but numbers
and collections.abc
are modules which are nothing but abstract base classes -- it would be surprising if anything from those modules wasn't an abstract base class. Whether it's put in typing
or dataclasses
, however, that won't hold true for this proposed class.
I don't have an issue with this. I think to some extent the dataclass design originally pushed in the direction of "this is just a utility to create ordinary classes, nobody should care that they were created by this particular utility." But in practice when you have a utility to create classes with a structured set of field metadata, people are going to find useful things to do with that field metadata, and in order to do that, they'll need to know which types have it. And this ship has long sailed, with the existence of free functions like dataclasses.fields
, dataclasses.astuple
, dataclasses.asdict
that can only operate on instances of dataclasses, and the existence of dataclasses.is_dataclass
to detect them.
So given that we already clearly support all of this, I don't see any reason we shouldn't make it a bit more ergonomic (and static type-checking friendly.)
Would like to know what @ericvsmith thinks.
I think to some extent the dataclass design originally pushed in the direction of "this is just a utility to create ordinary classes, nobody should care that they were created by this particular utility."
If you're interested in the history of this, from my discussion with the attrs creator, he was strongly opposed to inheritance in Python. Like you say, that ship has long sailed now 😄
I agree with Hynek about inheritance :) IMO this proposal is mostly orthogonal to inheritance; it's about ergonomically detecting whether a type has dataclass-style field metadata, which is relevant regardless of whether anyone is inheriting anything.
IMO this proposal is mostly orthogonal to inheritance; it's about ergonomically detecting whether a type has dataclass-style field metadata, which is relevant regardless of whether anyone is inheriting anything.
Yup, I was just providing some interesting history 😄 Glad to see there's so much agreement about having a base class.
I think that adding such an interface would make concrete assumptions about the internals of a dataclass. Right now __dataclass_fields__
is not even mentioned in https://docs.python.org/3/library/dataclasses.html
As far as I understand, dataclass
module tries to hide this API: this is why we have dataclass.fields
function and dataclass.is_dataclass
(they are abstract enough to not expose any API).
I think that adding such an interface would make concrete assumptions about the internals of a dataclass. Right now
__dataclass_fields__
is
Maybe we could just remove that from the proposal? In retrospect it was a mistake to add it.
Typeshed's stub for the class will have to have the __dataclass_fields__
attribute in order for type checkers to understand it properly, and I think it makes sense for the runtime to have that attribute as well. But I don't think that means that we need to document that it has that attribute or expose it as a public attribute in any way. I don't think this proposal would change the nature of this attribute for dataclasses at all, in fact -- it would still be an implementation detail. Users will only care that isinstance(<dataclass_instance>, DataclassLike)
returns True
; they won't care that it returns True
because the isinstance
machinery is checking to see if <dataclass_instance>
has a __dataclass_fields__
attribute.
Okay, to put some flesh on the bones of this proposal: here's a PR showing what this could look like:
I don't really see the point of reserving the Dataclass
name. Calling it DataclassLike
rather makes me wonder what, besides @dataclass
decorated classes, would be caught by isinstance(cls, DataclassLike)
.
Also, why use an abstract base class here, instead of a Protocol
? It seems that in the current PR it doesn't have any @abstractmethod
.
I don't really see the point of reserving the
Dataclass
name. Calling itDataclassLike
rather makes me wonder what, besides@dataclass
decorated classes, would be caught byisinstance(cls, DataclassLike)
.
Other classes from third-party libraries that have dataclass-like semantics, and also synthesize __dataclass_fields__
attributes. Many of these third-party libraries make use of the @dataclass_transform
decorator, so they are treated by type checkers as though they have the same semantics as dataclasses.
As a concrete example:
>>> from pydantic.dataclasses import dataclass
>>> @dataclass
... class Foo: ...
...
>>> Foo.__dataclass_fields__
{}
>>> import dataclasses
>>> dataclasses.is_dataclass(Foo)
True
I think a dataclasses.Dataclass
class also sounds like you could inherit from it to create a dataclass (similar to enum.Enum
or typing.Protocol
), and that's not the intent of the new ABC. Dataclasses works via decorators, and we don't want to change that.
Also, why use an abstract base class here, instead of a
Protocol
?
What would using a protocol get us here? The runtime implementation for typing.Protocol
is much more complicated and often has unexpected performance issues. dataclasses
currently takes pains to avoid importing typing
(so that it has a reasonably fast import time, and doesn't need to depend on typing
), and I don't see a reason to use typing.Protocol
here, when we can do everything we need with ABCs.
For os.PathLike
and contextlib.AbstractContextManager
, they're implemented as ABCs (so that they're fast) at runtime, but in typeshed, we pretend they're protocols, and type checkers treat them as though they're protocols. It works great; I imagine we'd do a very similar thing with this new ABC.
It seems that in the current PR it doesn't have any
@abstractmethod
.
Correct, but we need the ABCMeta
metaclass in order for the custom __subclasshook__
method to work as designed.
What would using a protocol get us here?
Using a protocol and allowing subclassing would allow composition with other protocols, which in my opinion would be useful. It allows creating a function that takes as argument a value that both can be passed to asdict, and has some special attribute or method.
(Keeping in mind that Python doesn't have intersections, and that the only way to compose protocols is by using subclassing).
Would DataclassType make sense for name?
Feature or enhancement
Add the class
dataclasses.Dataclass
thatdataclasses.is_dataclass
, and__init_subclass__
.(Edit: removed "has an attribute
__dataclass_fields__
for the purpose of type checking")Pitch
This class would simplify runtime type-checking of dataclasses:
and similarly
Besides being simpler, it would also allow combined instance checks like
isinstance(x, Dataclass | SomeOtherBase)
, which is arguably more Pythonic than the combined function calls above (marked "Previously").It would also allow users to annotate dataclasses using
Dataclass
ortype[Dataclass]
, which is currently impossible without reaching into protected areas of the typeshed.Inheritance could be blocked for now with a friendly error message (that perhaps tells you to use the decorator either directly or in a base class's
__init_subclass__
).Previous discussion
There's some discussion here: https://github.com/python/typing_extensions/issues/115 @AlexWaygood @JelleZijlstra @henryiii
Linked PRs