pandas-dev / pandas-stubs

Public type stubs for pandas
BSD 3-Clause "New" or "Revised" License
229 stars 122 forks source link

df.columns.str.match actually gives npt.NDArray[np.bool_], but mypy thinks it is pd.Index[str] #983

Closed cmp0xff closed 1 week ago

cmp0xff commented 3 weeks ago

Describe the bug

df.columns.str.match(reg_ex_pattern) actually gives npt.NDArray[np.bool_], but mypy thinks it's pd.Index[str]

To Reproduce

Provide a minimal runnable pandas example that is not properly checked by the stubs.

from typing import TYPE_CHECKING, cast

import numpy as np
import pandas as pd

if TYPE_CHECKING:
    from numpy import typing as npt

df = pd.DataFrame({"1": [2, 3], "2": 3, "4": 5, "a": 1})
mask = df.columns.str.match(r"\d")
print(mask)  # array([ True,  True,  True, False])
print(type(mask))  # <class 'numpy.ndarray'>
df.loc[:, mask]  # mypy: error: Invalid index type "tuple[slice, Index[str]]" for "_LocIndexerFrame"; expected type "slice | ndarray[Any, dtype[integer[Any]]] | Index[Any] | list[int] | Series[int] | <6 more items>"
df.loc[:, cast("npt.NDArray[np.bool_]", mask)]  # mypy: fine

Indicate which type checker you are using (mypy or pyright).

I am using mypy.

Show the error message received from that type checker while checking your example.

error: Invalid index type "tuple[slice, Index[str]]" for "_LocIndexerFrame"; expected type "slice | ndarray[Any, dtype[integer[Any]]] | Index[Any] | list[int] | Series[int] | <6 more items>"

Please complete the following information:

Additional context

Nothing

Dr-Irv commented 3 weeks ago

Thanks for the report. The declaration of match() in core/strings.pyi is incorrect. But to fix it, the StringMethods class will need an additional argument to pass in the expected result of match(), similar to what is done with str.split().

PR with tests welcome