python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.5k stars 2.83k forks source link

Change in inference of overloads involving `Hashable` and `slice` now that `slice` is generic #18149

Open AlexWaygood opened 3 days ago

AlexWaygood commented 3 days ago

Bug Report

Consider the following Python snippet, saved as foo.pyi:

from typing import assert_type, overload, Hashable

class Foo: ...

class DataFrame:
    @overload
    def __getitem__(self, key: slice) -> DataFrame: ...
    @overload
    def __getitem__(self, key: Hashable) -> Foo: ...

df = DataFrame()
assert_type(df[1:], DataFrame)
assert_type(df[:2], DataFrame)

Prior to the recent change to make slice generic in typeshed, mypy used to emit no errors on this snippet. However, mypy on the master branch emits the following errors:

foo.pyi:7: error: Overloaded function signatures 1 and 2 overlap with incompatible return types  [overload-overlap]
foo.pyi:13: error: Expression is of type "Any", not "DataFrame"  [assert-type]
foo.pyi:14: error: Expression is of type "Any", not "DataFrame"  [assert-type]

The first error here seems reasonable to me, but it's unclear to me why mypy now infers Any as the result of the df subscriptions. This snippet is minimized from the sole new mypy_primer error reported in https://github.com/python/typeshed/pull/11637#issuecomment-2435268577.

To Reproduce

https://mypy-play.net/?mypy=master&python=3.12&gist=339ba7f72c9048770ce15d6dc75207a1

Your Environment

randolf-scholz commented 3 days ago

I think this is because since python 3.12, slice is hashable, hence it satisfies both overloads and mypy performs a join of the return types. When you run with 3.11 the error disappears:

https://mypy-play.net/?mypy=master&python=3.11&gist=339ba7f72c9048770ce15d6dc75207a1

randolf-scholz commented 3 days ago

@AlexWaygood Uhh, I just discovered that it actually seems to be related to the bug #2410 we just encountered in typeshed as well, because when you use non-literal slices it also disappears!

from typing import assert_type, overload, Hashable

class Foo: ...

class DataFrame:
    @overload
    def __getitem__(self, key: slice, /) -> "DataFrame": ...
    @overload
    def __getitem__(self, key: Hashable, /) -> Foo: ...

df = DataFrame()
assert_type(df[1:], DataFrame)  # ❌
assert_type(df[:2], DataFrame)  # ❌
assert_type(df[slice(1, None)], DataFrame)  # ✅
assert_type(df[slice(None, 2)], DataFrame)  # ✅

https://mypy-play.net/?mypy=master&python=3.12&gist=935d2927539dbe01d6727037ace98469


I was wondering about this because according to https://mypy.readthedocs.io/en/stable/more_types.html#type-checking-calls-to-overloads, no join operation should happen. I guess there is some legacy code responsible for dealing with literal slices in __getitem__ that's going haywire?

brianschubert commented 3 days ago

I guess there is some legacy code responsible for dealing with literal slices in __getitem__ that's going haywire?

~Indeed, this is actually hardcoded into the expression checker:~ Nevermind, I had my wires crossed about what this issue was about, sorry for the noise!

https://github.com/python/mypy/blob/2ebc690279c7e20ed8bec006787030c5ba57c40e/mypy/checkexpr.py#L5609-L5619.

brianschubert commented 3 days ago

Actually, my previous comment wasn't as far from the mark as I thought :laughing:

The difference between slice expressions and non-literal slices does come down to the hardcoded analysis in the expression checker, specifically the return:

https://github.com/python/mypy/blob/2ebc690279c7e20ed8bec006787030c5ba57c40e/mypy/checkexpr.py#L5619

This erases any type information and makes all slice expressions be treated as slice[Any, Any, Any]. Manually passing a slice of type slice[Any, Any, Any] indeed has the same behavior:

df = DataFrame()
x: slice[Any, Any, Any]
assert_type(df[x], DataFrame)  # E: Expression is of type "Any", not "DataFrame"

I'm not fully sure why passing slice[Any, Any, Any] causes the overload return type to resolve to Any, but I think it may have to do with the way infer_overload_return_type resolves ambiguous overloads, specifically now that this check will resolve to True: https://github.com/python/mypy/blob/2ebc690279c7e20ed8bec006787030c5ba57c40e/mypy/checkexpr.py#L2862 so this branch can no longer be taken: https://github.com/python/mypy/blob/2ebc690279c7e20ed8bec006787030c5ba57c40e/mypy/checkexpr.py#L2882-L2884

I think the fix would be to just properly infer the generic type arguments for slice expressions. Naively that could look something like the following, though we'd probably want to make it match the behavior of slice.__new__ in typeshed.

--- a/mypy/checkexpr.py
+++ b/mypy/checkexpr.py
@@ -5612,11 +5612,15 @@ class ExpressionChecker(ExpressionVisitor[Type]):
         except KeyError:
             supports_index = self.chk.named_type("builtins.int")  # thanks, fixture life
         expected = make_optional_type(supports_index)
+        type_args = []
         for index in [e.begin_index, e.end_index, e.stride]:
             if index:
                 t = self.accept(index)
                 self.chk.check_subtype(t, expected, index, message_registry.INVALID_SLICE_INDEX)
-        return self.named_type("builtins.slice")
+                type_args.append(t)
+            else:
+                type_args.append(NoneType())
+        return self.chk.named_generic_type("builtins.slice", type_args)