Open hsolbrig opened 2 years ago
Sample test code can be found at https://github.com/hsolbrig/dataclasstest/blob/main/src/dataclasscopytest.py
This isn't a "copy constructor", this is "apply the @dataclass
decorator to the same class twice". I think it makes more sense to raise an exception if you try to do this, rather than have it succeed.
Can I ask what your use case is?
The more I think about this, the more I think the right thing to do is prevent dataclass
from being called twice on the same class. For example, what would:
@dataclass(repr=False)
@dataclass(repr=True)
class C:
x: int
result in?
Heavens - please don't do this! This feature is currently being utilized by the pydantic community to allow dataclasses
and pydantic
to co-exist. It is a wonderful bit of code that keeps us (linkml) from having to choose between the two approaches.
For details, scroll down in https://pydantic-docs.helpmanual.io/usage/dataclasses/ to the section labeled "Convert stdlib dataclasses into pydantic dataclasses". Its use can be found at https://github.com/samuelcolvin/pydantic/blob/master/pydantic/dataclasses.py#L159
I may have used the wrong terminology by calling it a "copy constructor", but if this suggested change is objectionable, please leave the code as it is and we'll work with the pydantic community to figure out an alternative.
I'm not going to do anything rash!
What would pydantic expect from:
@dataclass(repr=False)
@dataclass(repr=True)
class C:
x: int
Or does it always expect that if called twice on the same class that all of the parameters to dataclass
would be the same?
Let me see whether I can get the pydantic authors involved.
WRT the above question, I actually have no idea what to expect. As the old joke goes, Patient: "Doc, it hurts when I do this!" Doc: "Then don't do it!"
I've posted a heads up here -- hopefully someone on that side can clock in.
Just mulling it over, my recollection is that repr=False
would be superfulous in this case, as the presence of a __repr__
function is the same as repr=False
, no?
Hi, pydantic author here.
Thanks so much @hsolbrig for bringing this up.
Ignoring pydantic for a moment, I completely agree that it would make sense for
@dataclass(repr=False)
@dataclass(repr=True)
class C:
x: int
to be illegal. Or to put it differently - an explicit error would be better than undefined behaviour, wrong behaviour or a less clear error later on.
The reason pydantic cares about this is because people want to convert a standard dataclass to a "pydantic dataclass" - e.g. a dataclass with machinery to perform full type validation. But we want the "pydantic dataclass" to be as close as possible to a vanilla standard dataclass.
E.g. we want:
import dataclasses
import pydantic
@dataclasses.dataclass
class Foobar:
a: str
b: list[int] = dataclasses.field(default_factory=lambda: [1, 2, 3])
Foobar2 = pydantic.dataclasses.dataclass(Foobar)
f2 = Foobar2(a="a")
print(f)
To work. Currently it doesn't because the new b
has no default_factory
.
I'd be happy/willing to rewrite how pydantic.dataclasses.dataclass
works, but this is our 3rd or even 4th rewrite of pydantic dataclasses which suggests a documented, approved and std lib tested way to do what we're trying to do would be useful.
Is that something you'd consider?
Update on the proposed patch: the last line should read:
setattr(cls, f.name, f.default_factory() if f.default is MISSING else f.default) # <-- add conditional
Executing the default_factory vs. just adding it. Will be running more tests to see whether this introduces any other problems
Does pydantic always use the same parameters every time it calls dataclass
on the same class?
@hsolbrig: Could you please provide a PR or at least a diff?
Will do -- the fix proposal is going to be a bit more complex than I'd first thought so it will probably be a day or two before I'm sure that everything works.
PR submitted - happy to chat about it, although we will probably want to include @samuelcolvin as well.
I suspect that the preferable behavior might be to prevent dataclass being called twice on the same class, but to provide a supported way to "look before you leap" and check if a class has had the dataclass decorator run on it already, i.e. something like the is_direct_dataclass
added in #92406, though I think the implementation should be simpler and just check for "__dataclass_fields__" in cls.__dict__
. Pydantic could also just do this check directly with no new API, if "__dataclass_fields__
will only be in the __dict__
of a class that has had the decorator run on it" is a stable guarantee.
I don't think it makes sense to silently allow running the dataclass decorator twice on the same class, where the second run is a no-op. As @ericvsmith already has hinted, this leads to pretty confusing behavior if the params used are not identical for the two calls. You could have a situation where you clearly call the dataclass decorator on a class and say frozen=True
, but the resulting class is not frozen, because someone else already called the decorator on it previously. So I think double-running on the same class should be an error.
As far as I can tell, pydantic doesn't really need the ability to re-run the decorator on an existing dataclass -- in that case pydantic is just counting on it doing no harm (but currently it does do harm in any case where field()
was used to define a field originally, because the second run won't see the full Field
, just the default value left as an attribute on the class.) It seems like pydantic could easily be changed to check first and just skip running the dataclass decorator if the class at hand has already been processed by @dataclass
. Am I wrong here?
I guess it's up to pydantic whether in this case it wants to check for a match between the options given to the pydantic decorator and the ones previously used on the already-a-dataclass (this check is possible by introspecting __dataclass_params__
) or just allow possible confusion if they don't match.
I'm finally getting around to looking at this again. My thinking is to add a dataclasses.is_exact_dataclass()
(or whatever name) and then have @dataclass
raise a new exception ClassIsAlreadyDataclass
(maybe a subclass of RuntimeError
) if a class is already an exact dataclass.
is_exact_dataclass
would basically just check cls.__dict__.get("__dataclass_fields__")
.
If we go down this path, I'll close this issue and create a new one.
Bug report The
dataclass
copy constructor doesn't recognize the default_factory.Causes the following error`
Your environment This bug has been shown to exist in versions 3.9 and 3.10. The test above is python 3.10.4.
Using the line numbers in python 3.10.4, this can be fixed with the following change to dataclasses.py line 958: