python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.55k stars 2.84k forks source link

Optimize is_unsafe_overlapping_overload_signatures for overlapping typevars case #18159

Open alexdrydew opened 1 week ago

alexdrydew commented 1 week ago

Fixes #18022

This change optimizes is_unsafe_overlapping_overload_signatures so overloads containing many type variables can be checked faster.

Let:

It seems that currently typechecking a functions with overloads have $O(N^2V^{2K})$ time complexity:

If I am not mistaken (please correct me if I am) it is not necessary to check pairs of overrides where same type variable is expanded to different variants across different signatures, so we can expand variables jointly for both signatures reducing time complexity to $O(N^2V^K)$

On my original example (using 6 type variables) this change gives 40x time improvement:

Example code snippet ```python from __future__ import annotations from typing import Generic, TypeVar, overload T1 = TypeVar("T1", int, str, bool) T2 = TypeVar("T2", int, str, bool) T3 = TypeVar("T3", int, str, bool) T4 = TypeVar("T4", int, str, bool) T5 = TypeVar("T5", int, str, bool) T6 = TypeVar("T6", int, str, bool) class A(Generic[T1, T2, T3, T4, T5, T6]): @overload def foo(self: A[int, T2, T3, T4, T5, T6]) -> int: ... @overload def foo(self: A[str, T2, T3, T4, T5, T6]) -> str: ... def foo(self) -> int | str: raise NotImplementedError ```
py-spy results without change ![profile](https://github.com/user-attachments/assets/e0328ab5-dea8-4a62-9d50-07f3f344a4aa)
py-spy results after change ![profile_new](https://github.com/user-attachments/assets/def3b9cb-0747-437b-ba0c-bd663d3f4e1b)
github-actions[bot] commented 1 week ago

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

sterliakov commented 1 week ago

This looks like an important performance fix, but unfortunately seems to change the mypy behaviour significantly. Two testcases based on your example:

from __future__ import annotations

from typing import Generic, TypeVar, overload

T1 = TypeVar("T1", int, str, bool)
T2 = TypeVar("T2", int, str, bool)
T3 = TypeVar("T3", int, str, bool)
T4 = TypeVar("T4", int, str, bool)

class A(Generic[T1, T2, T3, T4]):
    @overload
    def foo(self: A[int, T2, T3, T4]) -> int: ...

    @overload
    def foo(self: A[int, T2, T3, bool]) -> str: ...

    @overload
    def foo(self: A[str, T2, T3, T4]) -> str: ...

    def foo(self) -> int | str:
        return 0

And

from __future__ import annotations

from typing import Generic, TypeVar, overload

T1 = TypeVar("T1", int, str, bool)
T2 = TypeVar("T2", int, str, bool)
T3 = TypeVar("T3", int, str, bool)
T4 = TypeVar("T4", int, str, bool)

class A(Generic[T1, T2, T3, T4]):
    @overload
    def foo(self: A[int, T2, T3, bool]) -> str: ...

    @overload
    def foo(self: A[int, T2, T3, T4]) -> int: ...

    @overload
    def foo(self: A[str, T2, T3, T4]) -> str: ...

    def foo(self) -> int | str:
        return 0

(they only differ in overloads order)

Current mypy master correctly emits

error: Overloaded function signatures 1 and 2 overlap with incompatible return types

in both cases. With this patch the first one is green (which is invalid) and the second one emits

error: Overloaded function signatures 1 and 2 overlap with incompatible return types
note: Flipping the order of overloads will fix this error
github-actions[bot] commented 1 week ago

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

alexdrydew commented 1 week ago

@sterliakov Thank you for finding this case. It seems that the problem is deeper than I've initially thought.

Initially I thought that we get overload-overlap error for your provided examples because of T4: bool variant, but it turns out that the real source of the error is bool being subtype of int. It can be simplified to this example:

from typing import Generic, TypeVar, Union, overload

class A:
    pass

class B(A):
    pass

T1 = TypeVar("T1", A, B)
T2 = TypeVar("T2", A, B)

class T(Generic[T1, T2]):
    @overload
    def foo(self: T[T1, T2]) -> int: ...

    @overload
    def foo(self: T[T1, A]) -> str: ...  # same for flip signatures case

    def foo(self) -> Union[int, str]:
        raise NotImplementedError

Actual variant that triggers the error:

@overload
def foo(self: T[A, B]) -> int: ...

@overload
def foo(self: T[B, A]) -> str: ...
alternative example where `B` is not a subtype of `A` does not raise any errors ```python from typing import Generic, TypeVar, Union, overload class A: pass class B: pass T1 = TypeVar("T1", A, B) T2 = TypeVar("T2", A, B) class T(Generic[T1, T2]): @overload def foo(self: T[T1, T2]) -> int: ... @overload def foo(self: T[T1, A]) -> str: ... # same for flip signatures case def foo(self) -> Union[int, str]: raise NotImplementedError ```

It seems that mypy behavior here (at least according to testOverloadedPartiallyOverlappingTypeVarsAndUnion test case) is to assume that self type is covariant and to allow overload signatures to shadow later narrower signatures for some variants of generic.

I am actually not sure what is the correct behavior here. Currently this one seems most consistent to me:

Maybe it is a good idea to allow only strictly narrower function signature variants to shadow later ones to prevent cases when there is a pair of signature variants with exactly same arguments but incompatible return types? I've changed the last check in is_unsafe_overlapping_overload_signatures to handle this edge case in my last commit, but as you may see even that already changes behavior of at least one test case.

sterliakov commented 1 week ago

Maybe it is a good idea to allow only strictly narrower function signature variants to shadow later ones to prevent cases when there is a pair of signature variants with exactly same arguments but incompatible return types?

Could you illustrate this with some example? As far as I understand, you suggest the following to become allowed:

from typing import overload

class A: pass
class B(A): pass

@overload
def fn(x: B) -> str: ...
@overload
def fn(x: A) -> float: ...
def fn(x: A) -> str | float:
    return "" if isinstance(x, B) else 0

It's quite important to report overloads-overlap here since otherwise the following (safe) upcasting results in unsafe code that raises at runtime and reports no mypy errors:

def handle(t: A) -> float:
    return fn(t) + 1

handle(B())

BTW, there's a bug in current mypy (#10143): if I replace A and B with float and int resp., the error isn't reported... Also note #10157.

alexdrydew commented 1 week ago

Could you illustrate this with some example? As far as I understand, you suggest the following to become allowed

Not really, sorry for the confusion, I meant that strictly wider in terms of the arguments earlier overload signatures should shadow the later ones (emphasis on strictly here, otherwise it is just the current behavior), so:

sterliakov commented 1 week ago

Ohh. I have to ask: why don't we produce "signature 2 will never be matched [overload-cannot-match]" there? It is indeed type-safe as the second overload is just ignored entirely, but also indicates a programming error.

I don't think [overloads-overlap] is appropriate here, not emitting it in your case is fine, and that becomes a separate [overload-cannot-match] bug, but I have a feeling that something is deeply wrong with overloads compatibility checking - perhaps it's time to rewrite that part...

sterliakov commented 1 week ago

TBH, I think changes in this PR improve mypy - even though not all corner cases are handled, they weren't handled previously either, and the performance problem is rather dramatic. Could you add a testcase exercising this specific behaviour (many vars in overloads) to prevent introducing similar perf regressions in future?

alexdrydew commented 6 days ago

Ohh. I have to ask: why don't we produce "signature 2 will never be matched [overload-cannot-match]" there? It is indeed type-safe as the second overload is just ignored entirely, but also indicates a programming error.

The problem is that [overload-cannot-match] check is done using an approximation where type variables are replaced with unions of all their respective bounds, so we can't detect errors which are true only for some specific variants of type variables. I guess to fix that one would need to rewrite this part to expand signatures by type variables as well or rewrite check_overlapping_overloads completely.

github-actions[bot] commented 6 days ago

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

github-actions[bot] commented 2 hours ago

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅