python / cpython

The Python programming language
https://www.python.org/
Other
61.31k stars 29.55k forks source link

dataclass overriding a property in the parent class is broken #121354

Open marksandler2 opened 3 weeks ago

marksandler2 commented 3 weeks ago

Bug report

Bug description:

Consider the following snippet

@dc.dataclass
class A:
  a: int
  @property
  def b(self) -> int:
    return 1

@dc.dataclass
class B(A):  
  b: int = dc.field(default_factory=lambda: 0)

Dataclass apparently doesn't create a class attribute, so class B ends up with property , b of class A, which seems surprising, especially given that there is explicit assignent of field b in class B(A).

So this results in:

>>> B.b
<property object at ...>
>>> B(a=2)
Traceback (most recent call last):                                                                                                                            
  File "<stdin>", line 1, in <module>                                                                                                                         
  File "<string>", line 4, in __init__                                                                                                                        
AttributeError: property 'b' of 'B' object has no setter      

If i replace definition of b like so b: int = dc.field(default=0), or with b: int = 0 then everything works as expected:

>>> B.b
0
>>> B(a=2)
B(a=2, b=0)

I think it would be nice if dataclass would always create a class attribute (especially if there is explicit assignment!) so that overloads work as expected.

CPython versions tested on:

3.11, 3.12

Operating systems tested on:

Linux

sobolevn commented 3 weeks ago

Good catch, thank you! I will take a look.

sobolevn commented 2 weeks ago

I reproduced the same behaviour without dataclasses:

class A1:
  def __init__(self, a):
    self.a = a

  @property
  def b(self) -> int:
    return 1

class B1(A1):
  def __init__(self, a, b=0) -> None:
    self.a = a
    self.b = b

print(B1.b)
print(B1(1).b)

produces:

<property object at 0x104fa6f20>
Traceback (most recent call last):
  File "/Users/sobolev/Desktop/cpython2/ex.py", line 17, in <module>
    print(B1(1).b)
          ~~^^^
  File "/Users/sobolev/Desktop/cpython2/ex.py", line 14, in __init__
    self.b = b
    ^^^^^^
AttributeError: property 'b' of 'B1' object has no setter

So, I think that there's no actual bug. When dataclasses behave as regular python's classes - it is expected.

For your case you can do:

@dc.dataclass
class B(A):
  _x: int | None = None
  @property
  def b(self) -> int:
    return self._x or 0
  @b.setter
  def b(self, value: int) -> None:
    self._x = value
sobolevn commented 2 weeks ago

Oh, b: int = dc.field(default=0) makes it a bit different. Sorry, I've missed this part :)

sobolevn commented 2 weeks ago

b: int = dc.field(default=0) works, because it explicitly overrides a class-attribute with the default value: https://github.com/python/cpython/blob/db39bc42f90c151b298f97b780e62703adbf1221/Lib/dataclasses.py#L1027

I think that what can be done for this case: we can check that if a field overrides a property, then we can delete that property from our class namespace.

This way both b: int = dc.field(default=0) and b: int = dc.field(default_factory=lambda: 0) would work the same way. Right now it is indeed confusing.

sobolevn commented 2 weeks ago

Or we can leave this as-is, because:

cc @ericvsmith

marksandler2 commented 2 weeks ago

@sobolevn Yes, your original example is WAI - since property wasn't overriden. But here we explicity create a new override, so naively i would expect it to override the property definition. E.g. i can override property with another property or another class variable.

Would it make sense for dataclass to just always create a class attribute?

class B(a):
   b: int = dc.field(...)

should have B.b == dc.field, because this is what I would expect from such an assignment. THis way you don't need to do any special handling of properties.

https://github.com/python/cpython/blob/db39bc42f90c151b298f97b780e62703adbf1221/Lib/dataclasses.py#L1025

There is a comment in dataclass saying that if defautl_factory is specified "it should not be set at all in the post-processsed class", but it is not clear "why".

  if f.default is MISSING:
                # If there's no default, delete the class attribute.
                # This happens if we specify field(repr=False), for
                # example (that is, we specified a field object, but
                # no default value).  Also if we're using a default
                # factory.  The class attribute should not be set at
                # all in the post-processed class.
                delattr(cls, f.name)
marksandler2 commented 2 weeks ago

The example above of the workaround, isn't really very useful:

@dc.dataclass
class B(A):
  _x: int | None = None
  @property
  def b(self) -> int:
    return self._x or 0
  @b.setter
  def b(self, value: int) -> None:
    self._x = value

This is not a good solution at all, since i won't be able to pass b=3 in the constructor, and so will need to expose user to the implementation details of the class.

Currently we just switched to use functools.cached_property, which doesn't have this problem and works for our use case