Open ikonst opened 1 year ago
Where is __dataclass_fields__
documented? I don't see any reference to it in PEP 557 (which introduced dataclass
) or in the official Python documentation for dataclasses. I presume that it's an internal implementation detail of the stdlib dataclass
. If so, then I'd argue that it shouldn't appear in typeshed stubs, and nothing in the typing specs (including PEP 681) should depend on it.
The only reference to __dataclass_fields__
that I can find is in this typeshed issue and this referenced mypy issue. I don't think I agree with the conclusions that were reached in those threads.
Yeah, you've got it. Nominally your approach is sound: if it's not in the spec, typeshed shouldn't build a protocol on it. Rather, #14263 should've been the way to go. The typeshed solution leverages a "plugin" behavior anyway (the existence of this undocumented attr) so might as well go "plugin" on this one.
~In practice, libs like pydantic have been synthesizing this attr to make their models act as dataclasses, so there's interest in having is_dataclass
serve as an effective (and correct!) type guard for using the other dataclasses utility functions.~ (correction: pydantic actually produces a false positive and would improve if this issue is fixed)
FWIW, pyright also synthesizes this symbol, and it could be similarly argued that since it's an implementation detail of dataclasses, then pyright should not synthesize it, and then typeshed wouldn't "take advantage" of it (as they did in https://github.com/python/typeshed/issues/9345).
Or we can accept that we've grown to rely on it. 😔
This issue is about whether we should synthesize it indiscriminately for Data Class Transforms. Currently we do, and the following is a false positive:
import typing
_T = typing.TypeVar("_T")
@typing.dataclass_transform()
def create_model(cls: type[_T]) -> type[_T]:
cls.__init__ = lambda *args, **kwargs: None # type: ignore
return cls
@create_model
class C:
a: int
c = C(a=42)
typing.reveal_type(c.__dataclass_fields__)
Runtime:
typing.reveal_type(c.__dataclass_fields__)
^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'C' object has no attribute '__dataclass_fields__'
mypy:
file.py:18: note: Revealed type is "builtins.dict[builtins.str, Any]"
Success: no issues found in 1 source file
I don't think we should. PEP 681 is a specification for abstractions that work like dataclasses in their public interface; it doesn't require that abstractions that use dataclass_transform
mirror private implementation details of dataclasses.
@JelleZijlstra I agree, and this issue about removing this synthesized attribute in that case, basically
-self._add_dataclass_fields_magic_attribute()
+if self._spec is _TRANSFORM_SPEC_FOR_DATACLASSES:
+ self._add_dataclass_fields_magic_attribute()
At the same time, I did some sleuthing for a related issue, and it would appear that doing that would break many users that have grown to rely on @dataclass_transform
to create alternative decorators that ultimately do call into dataclasses.dataclass
and thus produce true dataclasses (example). It's not what PEP-681 was intended for, but I found copious examples. So now I don't know (insert "two buttons" meme).
FWIW, pyright also synthesizes this symbol
I added a synthesized __dataclass_fields__
in pyright because mypy added it and typeshed added a dependency on it. I should have done more research and asked questions before doing this.
Here are some questions that may help inform the best way forward...
__dataclass_fields__
hack added to typeshed? I presume it was to provide better type checking for replace
, asdict
, astuple
, fields
, and is_dataclass
.Thinking about potential solutions...
DataclassInstance
and all of its uses from typeshed, replacing all instances with object
. Both mypy and pyright could then also delete the synthesized __dataclass_fields__
. This would decrease type safety for the functions I mentioned above. If they're infrequently used and/or not a common source of bugs, this probably doesn't matter.__dataclass_fields__
attribute is synthesized. In other words, it could be specified by the user explicitly.As mentioned here, pyserde, strawberry, ibis would use the parameter mentioned in option 2. In fact, so would flax I believe.
We could augment dataclass_transform to support a new parameter that indicates whether a __dataclass_fields__ attribute is synthesized. In other words, it could be specified by the user explicitly.
PEP-681 says: "In the future, we may add additional parameters to dataclass_transform as needed to support common behaviors in user code. These additions will be made after reaching consensus on typing-sig rather than via additional PEPs."
Should we start a topic in typing-sig on adding a makes_dataclass
parameter?
Typing-sig is pretty much defunct at this point. I recommend using python/typing instead. But yes, that sounds like a good next step. I'm still not convinced that this is a problem that needs solving. It would be useful for you to present the motivation for a change in the discussion thread.
Per PEP 681, data class transforms are defined in terms of dataclasses, but it doesn't mean they make dataclasses.
In particular, transformed classes don't necessarily have
__dataclass_fields__
or haveis_dataclass
return True, but per mypy - they do.