python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.41k forks source link

Improve UX for mocking dataclasses that are problematic to instantiate #124176

Open ncoghlan opened 1 month ago

ncoghlan commented 1 month ago

Feature or enhancement

Proposal:

Passing instance=True to create_autospec misses fields without default values, even when the given spec is a dataclass object:

>>> from dataclasses import dataclass
>>> @dataclass
... class Example:
...     a: int
...     b: int = 0
...     def sum_fields(self):
...         return self.a + self.b
...
>>> from unittest.mock import create_autospec
>>> mock_instance = create_autospec(Example(1))
>>> mock_instance_from_class = create_autospec(Example, instance=True)
>>> set(dir(mock_instance)) - set(dir(mock_instance_from_class))
{'a'}

This is despite dataclass definitions making their field information readily available for introspection on the class object (without requiring instantiation):

>>> from dataclasses import fields
>>> [(field.name, field.type) for field in fields(Example)]
[('a', <class 'int'>), ('b', <class 'int'>)]

A similar problem occurs if the dataclass uses __post_init__ to set attributes that are not otherwise set:

>>> from dataclasses import dataclass, field
>>> @dataclass
... class PostInitExample:
...     a: int = field(init=False)
...     b: int = field(init=False)
...     def __post_init__(self):
...         self.a = self.b = 1
...
>>> mock_post_init_class = create_autospec(Example)
>>> mock_post_init_class = create_autospec(PostInitExample)
>>> mock_post_init_instance = create_autospec(PostInitExample())
>>> set(dir(mock_post_init_instance)) - set(dir(mock_post_init_class()))
{'b', 'a'}

While for most dataclasses it is straightforward to instantiate a specific instance and derive the mock autospec from that, this may not be desirable (or even feasible) if the dataclass requires references to real external resources to create a real instance.

There are various potential workarounds available for this functional gap, but they're all relatively clumsy, and come with their own problems (like not being able to use spec_set=True if the missing fields are added manually, or not handling defined methods properly if setting an explicit list of fields instead of using autospec, or still not adding the fields only defined in __post_init__ if instantiating a class mock).

By contrast, if create_autospec were to be made explicitly aware of data classes, it could do a pass over dataclasses.fields(spec) and use the type information to fill in any missing fields that don't have class level default values set.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

Previously filed here, but closed on the basis of create_autospec covering the use case: https://github.com/python/cpython/issues/80761

This is only true if the dataclass can be readily instantiated, hence this feature request.

There is also some previous discussion (and assorted attempted workarounds with various flaws) on this Stack Overflow question: https://stackoverflow.com/questions/51640505/how-to-use-spec-when-mocking-data-classes-in-python

Linked PRs

sobolevn commented 1 month ago

Great idea! @ncoghlan do you want to work on this? Or can I take this over? :)

ncoghlan commented 1 month ago

I already worked around it for my use case (leaving spec_set as False and adding in the missing attributes manually), so please feel free to take it on!

I think the basic cases where the field types are just a simple type will be straightforward (use create_autospec(..., instance=True) recursively), and the some_type|None/Optional[type] case can ignore the None branch and do the same. ClassVar and InitVar will already be filtered out by the fields(...) call.

Checking the result of dir(str|int) it might be OK to simply not do anything special for non-trival unions and other more complex type definitions like generic containers. The mock may end up with some additional methods it wouldn't otherwise have, but that's probably an acceptable limitation. For example:

>>> dir(str|int)
['__args__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__or__', '__parameters__', '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']

>>> set(dir(tuple[str, int, str])) - set(dir(tuple))
{'__deepcopy__', '__unpacked__', '__typing_unpacked_tuple_args__', '__parameters__', '__copy__', '__args__', '__mro_entries__', '__origin__'}
sobolevn commented 1 month ago

This will land in 3.14! 🎉

ncoghlan commented 1 month ago

@sobolevn , it belatedly occurred to me to ask: should this feature do something special for Annotated types? dataclasses doesn't do anything special with those, so the fully annotated type is reported in the fields output:

>>> @dataclass
... class Example:
...     a: Annotated[int, "Annotation"]
...
>>> from dataclasses import fields
>>> [(f.name, f.type) for f in fields(Example)]
[('a', typing.Annotated[int, 'Annotation'])]

Annotated types make the underlying type available in __origin__, so this could be handled via:

>>> @dataclass
... class Example:
...     a: int
...     b: Annotated[int, "Want to ignore this"]
...
>>> [(f.name, t.__origin__ if get_origin((t := f.type)) is Annotated else t) for f in fields(Example)]
[('a', <class 'int'>), ('b', <class 'int'>)]

I also realised we want to avoid a behaviour change for dataclasses that do define a default value of a specific type within a more general type category:

>>> @dataclass
... class Example:
...     narrow_default: int|None = field(default=30)
...
>>> spec_mock = create_autospec(Example, instance=True)
>>> spec_mock.narrow_default
<NonCallableMagicMock name='mock.narrow_default' spec='int' id='140646074973744'>
>>> [(f.name, f.type) for f in fields(Example)]
[('narrow_default', int | None)]

Since the initial implementation processes every fields entry after processing the dir entries, the mock in this example would switch from mocking int to mocking int|None.

While I think that would be a reasonable design if this native dataclass support had been added when dataclasses first entered the standard library, at this point we don't want to change the mocked types of attributes that were already being picked up by the previous dir-only implementation.

For the attributes that do appear in dir, we want to keep the existing attribute look up based processing, and only use fields for the attributes that would otherwise be missing.

sobolevn commented 1 month ago

I agree with the second part 100%, this is a regression in my new change. PR is on its way. I propose to discuss the first part separately. Because, we might want to cover other corner cases.

ncoghlan commented 1 month ago

I agree, the first part would be a new feature request.