pandas-dev / pandas-stubs

Public type stubs for pandas
BSD 3-Clause "New" or "Revised" License
228 stars 120 forks source link

pd.Series Iterable vs. Sequence #812

Open Ranfir opened 10 months ago

Ranfir commented 10 months ago

Describe the bug Type annotation for pd.Series uses Sequence[Any] instead of Iterable[Any], but the documentation states that argument can be an iterable.

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
    
    import pandas as pd

pd.Series({}.keys())


2. Indicate which type checker you are using (`mypy` or  `pyright`).
`mypy`
3. Show the error message received from that type checker while checking your example.

scratch.py:3: error: No overload variant of "Series" matches argument type "dict_keys[, ]" [call-overload] scratch.py:3: note: Possible overload variants: scratch.py:3: note: def [S1] Series(data: DatetimeIndex | Sequence[datetime64 | datetime] | datetime64 | datetime, index: Index[Any] | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., , dtype: Literal['datetime64[Y]', 'datetime64[M]', 'datetime64[W]', 'datetime64[D]', 'datetime64[h]', 'datetime64[m]', 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', 'datetime64[μs]', 'datetime64[ns]', 'datetime64[ps]', 'datetime64[fs]', 'datetime64[as]', 'M8[Y]', 'M8[M]', 'M8[W]', 'M8[D]', 'M8[h]', 'M8[m]', 'M8[s]', 'M8[ms]', 'M8[us]', 'M8[μs]', 'M8[ns]', 'M8[ps]', 'M8[fs]', 'M8[as]', '<M8[Y]', '<M8[M]', '<M8[W]', '<M8[D]', '<M8[h]', '<M8[m]', '<M8[s]', '<M8[ms]', '<M8[us]', '<M8[μs]', '<M8[ns]', '<M8[ps]', '<M8[fs]', '<M8[as]', 'date32[pyarrow]', 'date64[pyarrow]', 'timestamp[s][pyarrow]', 'timestamp[ms][pyarrow]', 'timestamp[us][pyarrow]', 'timestamp[ns][pyarrow]'] = ..., name: Hashable = ..., copy: bool = ...) -> TimestampSeries scratch.py:3: note: def [S1] Series(data: ExtensionArray | ndarray[Any, Any] | dict[str, ndarray[Any, Any]] | Sequence[Any] | IndexOpsMixin[Any], index: Index[Any] | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., , dtype: Literal['datetime64[Y]', 'datetime64[M]', 'datetime64[W]', 'datetime64[D]', 'datetime64[h]', 'datetime64[m]', 'datetime64[s]', 'datetime64[ms]', 'datetime64[us]', 'datetime64[μs]', 'datetime64[ns]', 'datetime64[ps]', 'datetime64[fs]', 'datetime64[as]', 'M8[Y]', 'M8[M]', 'M8[W]', 'M8[D]', 'M8[h]', 'M8[m]', 'M8[s]', 'M8[ms]', 'M8[us]', 'M8[μs]', 'M8[ns]', 'M8[ps]', 'M8[fs]', 'M8[as]', '<M8[Y]', '<M8[M]', '<M8[W]', '<M8[D]', '<M8[h]', '<M8[m]', '<M8[s]', '<M8[ms]', '<M8[us]', '<M8[μs]', '<M8[ns]', '<M8[ps]', '<M8[fs]', '<M8[as]', 'date32[pyarrow]', 'date64[pyarrow]', 'timestamp[s][pyarrow]', 'timestamp[ms][pyarrow]', 'timestamp[us][pyarrow]', 'timestamp[ns][pyarrow]'], name: Hashable = ..., copy: bool = ...) -> TimestampSeries scratch.py:3: note: def [S1] Series(data: PeriodIndex, index: Index[Any] | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., , dtype: PeriodDtype = ..., name: Hashable = ..., copy: bool = ...) -> PeriodSeries scratch.py:3: note: def [S1] Series(data: TimedeltaIndex | Sequence[timedelta64 | timedelta] | timedelta64 | timedelta, index: Index[Any] | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., , dtype: Literal['timedelta64[Y]', 'timedelta64[M]', 'timedelta64[W]', 'timedelta64[D]', 'timedelta64[h]', 'timedelta64[m]', 'timedelta64[s]', 'timedelta64[ms]', 'timedelta64[us]', 'timedelta64[μs]', 'timedelta64[ns]', 'timedelta64[ps]', 'timedelta64[fs]', 'timedelta64[as]', 'm8[Y]', 'm8[M]', 'm8[W]', 'm8[D]', 'm8[h]', 'm8[m]', 'm8[s]', 'm8[ms]', 'm8[us]', 'm8[μs]', 'm8[ns]', 'm8[ps]', 'm8[fs]', 'm8[as]', '<m8[Y]', '<m8[M]', '<m8[W]', '<m8[D]', '<m8[h]', '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[μs]', '<m8[ns]', '<m8[ps]', '<m8[fs]', '<m8[as]', 'duration[s][pyarrow]', 'duration[ms][pyarrow]', 'duration[us][pyarrow]', 'duration[ns][pyarrow]'] = ..., name: Hashable = ..., copy: bool = ...) -> TimedeltaSeries scratch.py:3: note: def [S1, _OrderableT] Series(data: IntervalIndex[Interval[_OrderableT]] | Interval[_OrderableT] | Sequence[Interval[_OrderableT]], index: Index[Any] | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., , dtype: Literal['Interval'] = ..., name: Hashable = ..., copy: bool = ...) -> IntervalSeries[_OrderableT] scratch.py:3: note: def [S1] Series(data: str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex | ExtensionArray | ndarray[Any, Any] | dict[str, ndarray[Any, Any]] | Sequence[Any] | IndexOpsMixin[Any] | dict[int, Any] | dict[str, Any] | None, index: Index[Any] | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., , dtype: type[S1], name: Hashable = ..., copy: bool = ...) -> Series[S1] scratch.py:3: note: def [S1] Series(data: S1 | ExtensionArray | ndarray[Any, Any] | dict[str, ndarray[Any, Any]] | Sequence[S1] | IndexOpsMixin[S1] | dict[int, S1] | dict[str, S1], index: Index[Any] | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., , dtype: ExtensionDtype | str | dtype[generic] | type[str] | type[complex] | type[bool] | type[object] = ..., name: Hashable = ..., copy: bool = ...) -> Series[S1] scratch.py:3: note: def [S1] Series(data: str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex | ExtensionArray | ndarray[Any, Any] | dict[str, ndarray[Any, Any]] | Sequence[Any] | IndexOpsMixin[Any] | dict[int, Any] | dict[str, Any] | None = ..., index: Index[Any] | Series[Any] | ndarray[Any, Any] | list[Any] | dict[Any, Any] | range | tuple[Any, ...] | None = ..., , dtype: ExtensionDtype | str | dtype[generic] | type[str] | type[complex] | type[bool] | type[object] = ..., name: Hashable = ..., copy: bool = ...) -> Series[Any] Found 1 error in 1 file (checked 1 source file)



**Please complete the following information:**
 - OS: [e.g. Windows, Linux, MacOS]: Ubuntu
 - OS Version [e.g. 22]: 20.04.6 LTS
 - python version: 3.11.6
 - version of type checker: mypy 1.6.1
 - version of installed `pandas-stubs`: 2.1.1.230928

**Additional context**
Add any other context about the problem here.
Dr-Irv commented 10 months ago

While the documentation does say "Iterable", not all iterables are accepted. For example, a set is not accepted. The keys() method acts like a set from a typing perspective. So while passing keys() to Series() works, it's not a best practice.

There is an open issue on the pandas repo where I brought up whether we want to accept dictionary views: https://github.com/pandas-dev/pandas/issues/55425#issuecomment-1753159508

I will leave this open for now, but I don't think we are going to support this.

Dr-Irv commented 10 months ago

Created a new issue for pandas: https://github.com/pandas-dev/pandas/issues/55842