pandas-dev / pandas-stubs

Public type stubs for pandas
BSD 3-Clause "New" or "Revised" License
232 stars 123 forks source link

"pathlib.Path / pd.Series" should be inferred as Series, not as Path #682

Open bersbersbers opened 1 year ago

bersbersbers commented 1 year ago

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
    
    from pathlib import Path

import pandas as pd

folder = Path.cwd() files = pd.Series(["a.png", "b.png"])

reveal_type(files) # good: Series!

paths = folder / files

reveal_type(paths) # bad: pathlib.Path!

good_paths = [f if f.is_file() else pd.NA for f in paths]

print(paths) print(good_paths)


2. Indicate which type checker you are using (`mypy` or  `pyright`).
`mypy 1.2.0 (compiled: yes)`

3. Show the error message received from that type checker while checking your example.

bug.py:10: error: "Path" has no attribute "iter"; maybe "enter"? (not iterable) [attr-defined] Found 1 error in 1 file (checked 1 source file)



**Please complete the following information:**
 - OS: Windows 11
 - OS Version: 22H2
 - python version: 3.10.11
 - version of type checker: 1.2.0
 - version of installed `pandas-stubs`: 2.0.1.230501
twoertwein commented 1 year ago

__truediv__ and __rtruediv__ are defined for Path and Series but based on the annotations of the second argument non of them claims to allow Path / Series so I'm surprised that mypy/pyright do not report an error here.

Dr-Irv commented 1 year ago

__truediv__ and __rtruediv__ are defined for Path and Series but based on the annotations of the second argument non of them claims to allow Path / Series so I'm surprised that mypy/pyright do not report an error here.

I guess this is a mypy bug, because pyright DOES pick this up as invalid in terms of the / operator.

I wouldn't want to support __truediv__ and __rtruediv__ for untyped Series. We could support it for Series[str], but to do that, we'd have to create a StringSeries (as we have done with TimestampSeries, TimedeltaSeries, etc.) so that the __truediv__ and __rtruediv__ operators would work for Path arguments, but not for untyped series, or series with other dtypes. Then, for the OP example, you'd change the declaration of the first Series to be files = pd.Series(["a.png", "b.png"], dtype=str)

Open to a PR that does this.