Handle Type Symmetry in Overloaded Descriptor Methods Based on `instance` Type

microsoft / pyright

Static Type Checker for Python

Other

13.47k stars 1.48k forks source link

Handle Type Symmetry in Overloaded Descriptor Methods Based on `instance` Type #9460

Closed saryou closed 1 week ago

saryou commented 1 week ago

Overview Currently, when multiple overloads are defined for a descriptor's __get__ method with an instance type other than None, the descriptor is treated as asymmetric. However, if the types can be determined as symmetric based on the instance type, it would be preferable for Pyright to handle such descriptors as symmetric.

Proposed Improvement Currently, it seems that Pyright determines asymmetry at the level of the descriptor class itself. However, it would be more accurate if Pyright could assess asymmetry based on the context in which the descriptor is accessed, selecting the appropriate method signature accordingly. This approach would enhance type inference precision by evaluating symmetry in the context of each specific usage.

Reproduction Code


from typing import Any, overload, Self

class Descriptor:
    @overload
    def __get__(self, instance: None, owner: Any) -> Self:
        ...

    @overload
    def __get__(self, instance: 'A', owner: Any) -> int | None:
        ...

    @overload
    def __get__(self, instance: Any, owner: Any) -> Self:
        ...

    def __set__(self, instance: Any, value: int | None):
        ...

class A:
    desc = Descriptor()

class B:
    desc = Descriptor()

a = A()
b = B()
a.desc = 1
b.desc = 1
reveal_type(a.desc)  # int | None, which could be narrowed to Literal[1]
reveal_type(b.desc)  # Descriptor as expected

erictraut commented 1 week ago

Hmm, I'm not convinced.

First, the way you've defined this descriptor is quite unusual. I've never seen a descriptor that is so contextual.

Second, I think there's value in keeping typing rules simple. The Python typing spec doesn't current specify how type checkers should handle symmetric versus asymmetric descriptors or how to distinguish between them, but I'm hoping that we can augment the spec to clarify the behavior in the near future. When writing specifications, it's important for them to be clear, consistent, and as simple as possible. What you're suggesting here would be a special case to the specification, and I'm not sure such a special case is well justified.

I'm going to reject this enhancement request for now. I'm willing to revisit it if and when the Python typing spec is updated to clarify the behavior.

saryou commented 1 week ago

@erictraut

Thank you for the prompt response and for taking the time to consider this request.

In my view, this use case may not be as unusual as it initially seems. For instance, the popular django-stubs library employs a similar approach (see here). In django-stubs, the Field class is defined as a descriptor, returning an appropriate value if the instance is a Django model, or otherwise returning the Field itself. This kind of context-based descriptor behavior is thus already in use within widely adopted libraries, suggesting that it might be a relevant case to consider.

Furthermore, I believe that the impact of the change to overloaded descriptor behavior in version 1.1.387 has not fully surfaced yet, and we may see more feedback as users start encountering it.

I fully understand and agree with the need to keep the typing specification simple. While it’s unclear how many users are affected by this specific use case, I’d be grateful if it could be revisited in the future as more insights emerge.

Thank you, as always, for the fantastic work you’re doing.

saryou commented 1 week ago

@erictraut

I have an additional question, as I’m still unsure if this behavior is expected or a bug.

The documentation mentions that assignment-based type narrowing is not applied to asymmetric descriptors. However, is it also intended behavior that narrowing does not apply even with consecutive accesses to __get__?

a = A()
a.desc = 1
reveal_type(a.desc)  # int | None
if isinstance(a.desc, int):
    reveal_type(a.desc)  # still int | None

This behavior has become quite inconvenient in many codebases that work with Django models. In particular, common type guards like the one below no longer function as expected:

if obj.prop is not None:
    obj.prop.do_something()  # "do_something" is not a known attribute of "None"

erictraut commented 1 week ago

Yes, this is the intended behavior. The expression a.desc is not narrowed, so subsequent accesses to __get__ are evaluated based on the return type of __get__. This makes sense for an asymmetric descriptor because the __set__ value is not necessarily the same type as the value subsequently returned by __get__. Of course, this can be true for symmetric descriptors as well, but it's quite rare in that case. Maybe the right answer here is to provide some way to tell a type checker whether a descriptor (or property) always returns the same value from its getter that was specified in its setter. That way, we could avoid using get/set symmetry as a proxy to make this determination.

saryou commented 1 week ago

@erictraut

Thank you for your response.

I believe there are two main points to consider here.

First, there’s the question of whether to assume that __get__ returns the same value that was set by __set__. For this, I think the current implementation’s approach—where an asymmetric descriptor is assumed not to return the same value—is appropriate. However, as this enhancement request suggests, there may still be room to improve how asymmetry is determined. Of course, as you mentioned, it would be even better if there were a way to explicitly signal this to the type checker.

Second, there’s the question of whether __get__ should be assumed to be idempotent. Given that __get__ is accessed in a way that resembles property access, I believe it’s a very unusual use case for __get__ to introduce side effects and return different values on each access. Therefore, it seems reasonable to assume, in most use cases, that consecutive accesses to __get__ would return the same value. Even if we allow for non-idempotent property access—like an access counter, for instance—it seems quite unlikely that the type would vary on each access. Here again, an approach like a @no_side_effect decorator to indicate idempotence to the type checker could be helpful.