pandas-dev / pandas-stubs

Public type stubs for pandas
BSD 3-Clause "New" or "Revised" License
226 stars 118 forks source link

Mypy complains about inconsistent MRO when using `isinstance()` on a generic `Series` object #269

Open tmke8 opened 1 year ago

tmke8 commented 1 year ago

To Reproduce

import pandas as pd

x: pd.Series[int] = pd.Series([1, 3, 4], name="my series")
assert isinstance(x, pd.Series)  # this line produces the error below

mypy gives the following error:

repro.py:4: error: Subclass of "Series[int]" and "TimestampSeries" cannot exist: would have inconsistent method resolution order  [unreachable]

pyright does not complain. Nor does mypy complain if I drop the [int] for pd.Series.

Please complete the following information:

Dr-Irv commented 1 year ago

I don't think we can fix this. The code you wrote won't execute because pandas doesn't treat Series as generic. So trying to fix an issue for type checking with respect to code that won't execute doesn't seem like a good use of time.

We're using the generic form of Series to be able to limit certain operations (e.g., adding two series consisting of timestamps). But you can't declare something to be of type Series[sometype] due to differences between pandas and the stubs.

tmke8 commented 1 year ago

I have code like that in my code base and it executes perfectly fine:

Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> def f():
...   x: pd.Series[int] = pd.Series([1, 3, 4], name="my series")
...   assert isinstance(x, pd.Series)
...   return x
... 
>>> f()
0    1
1    3
2    4
Name: my series, dtype: int64

Variable annotations aren't executed within functions so this definitely works.

tmke8 commented 1 year ago

Here is a reproducer without the type annotation:

import pandas as pd

x = pd.Series({0: 0, 1: 1}, name="my series", dtype=int)
assert isinstance(x, pd.Series)

mypy:

 % mypy repro.py
repro.py:4: error: Subclass of "Series[int]" and "TimestampSeries" cannot exist: would have inconsistent method resolution order  [unreachable]
Found 1 error in 1 file (checked 1 source file)
Dr-Irv commented 1 year ago

Thanks for the latter example. We'll look into it.

Dr-Irv commented 1 year ago

@thomkeh I tried this with pandas-stubs 1.4.4.220919 and cannot reproduce the latter example. Not sure what we have changed since then, so can you try with that version and see if it still happens?

Also, double check your mypy version - I was using 0.971 (which you said above)

tmke8 commented 1 year ago

I checked again and I noticed that I can only reproduce it with the --warn-unreachable flag:

mypy --warn-unreachable pandas_test.py

I just did it on mypy 0.981 and with pandas stub 1.5.0.220926.

Dr-Irv commented 1 year ago

I am guessing this is some kind of mypy bug. We test with warn-unreachable=False, because when using mypy on the pandas source, it produces some false positives.

It's pretty odd that it would report an error in your code related to what is inside the stubs.

jonyscathe commented 1 year ago

@thomkeh did you find any resolution for this issue?

I am also having the same problem. No idea if the issue is within mypy or pandas-stubs, but I agree that my code executes fine and it type hints with no issues if I just use a Series rather than a Series[float] in my case, so long as I have a type: ignore[type-arg]

I also found that if I have something like:

result = input.loc[x] if isinstance(input, Series) else input

results in a Item "float" of "Union[float, Series[float]]" has no attribute "loc" which it shouldn't do given that it only runs that code if input is a Series.

tmke8 commented 1 year ago

@jonyscathe I don't have a solution. My next step was going to be to open an issue in the mypy repository but I wanted to have a smaller reproducing code snippet first.

tmke8 commented 1 year ago

I found a standalone reproducer:

from __future__ import annotations
import datetime
from typing import Any, Generic, TypeVar, Union, overload

S1 = TypeVar("S1", int, datetime.datetime)

class Series(Generic[S1]):
    @overload
    def __new__(cls, data: datetime.datetime) -> TimestampSeries: ...

    @overload
    def __new__(cls, data: dict[int, S1]) -> Series[S1]: ...

    def __new__(cls, data: Union[datetime.datetime, dict[int, S1]]) -> Any:
        return

class TimestampSeries(Series[datetime.datetime]): ...

x = Series({0: 0, 1: 1})
reveal_type(x)
assert isinstance(x, Series)

with mypy --warn-unreachable:

minimal_mro_problem.py:20: note: Revealed type is "minimal_mro_problem.Series[builtins.int]"
minimal_mro_problem.py:21: error: Subclass of "Series[int]" and "TimestampSeries" cannot exist: would have inconsistent method resolution order  [unreachable]
tmke8 commented 1 year ago

I reported it here: https://github.com/python/mypy/issues/13824

Dr-Irv commented 1 year ago

Bug still exists in mypy 0.990

twoertwein commented 1 year ago

This seems to work now with mypy 1.4.1 and the latest pandas-stubs :)

tmke8 commented 1 year ago

Hmm, I can still reproduce:

tmke8@ubuntu:~$ pip install git+https://github.com/pandas-dev/pandas-stubs.git
Collecting git+https://github.com/pandas-dev/pandas-stubs.git
  Cloning https://github.com/pandas-dev/pandas-stubs.git to /tmp/pip-req-build-fb3nmjwl
  Running command git clone --filter=blob:none --quiet https://github.com/pandas-dev/pandas-stubs.git /tmp/pip-req-build-fb3nmjwl
  Resolved https://github.com/pandas-dev/pandas-stubs.git to commit fbec52bbff022384bd30bd69dcda776c22d19729
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: types-pytz>=2022.1.1 in /home/tmk/.cache/pypoetry/virtualenvs/ethicml-dzQunYke-py3.10/lib/python3.10/site-packages (from pandas-stubs==2.0.2.230605) (2022.1.2)
Requirement already satisfied: numpy>=1.25.0 in /home/tmk/.cache/pypoetry/virtualenvs/ethicml-dzQunYke-py3.10/lib/python3.10/site-packages (from pandas-stubs==2.0.2.230605) (1.25.2)
Building wheels for collected packages: pandas-stubs
  Building wheel for pandas-stubs (pyproject.toml) ... done
  Created wheel for pandas-stubs: filename=pandas_stubs-2.0.2.230605-py3-none-any.whl size=151715 sha256=43e3e6baddaa211ea09637e8eedfb60fe57324b41f0b16cbf0a0108715618276
  Stored in directory: /tmp/pip-ephem-wheel-cache-suw3twnj/wheels/88/ba/da/a34e583c952d4fc1cf67b3763fc7c19b34a58ad569ab1aa6e6
Successfully built pandas-stubs
Installing collected packages: pandas-stubs
Successfully installed pandas-stubs-2.0.2.230605
tmke8@ubuntu:~$ cat stub_bug.py
import pandas as pd

x = pd.Series({0: 0, 1: 1}, name="my series", dtype=int)
assert isinstance(x, pd.Series)
tmke8@ubuntu:~$ mypy --version                                                
mypy 1.4.1 (compiled: yes)
tmke8@ubuntu:~$ mypy --warn-unreachable stub_bug.py                           
stub_bug.py:4: error: Subclass of "Series[int]" and "TimestampSeries" cannot exist: would have inconsistent method resolution order  [unreachable]
Found 1 error in 1 file (checked 1 source file)

Or does pip install git+https://github.com/pandas-dev/pandas-stubs.git not give me the most recent version?

twoertwein commented 1 year ago

You are right - I tested it without --warn-unreachable