pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.48k stars 17.87k forks source link

Remove ABC Usage from pandas._typing #27146

Closed WillAyd closed 5 years ago

WillAyd commented 5 years ago

Right now we use the ABC classes in pandas._typing to work around circular imports, but I think this has the effect of making our type checking practically non-existent. The ABCs are factory generated by an unannotated function, so they get replaced with Any

To illustrate, run mypy on a module with this content

from pandas.core.dtypes.generic import ABCDataFrame

some_obj = None  # type: ABCDataFrame
reveal_type(some_obj)

Mypy runs without issue yielding

test.py:4: note: Revealed type is 'Any'

By contrast, using the actual DataFrame class:

import pandas as pd

some_obj = None  # type: pd.DataFrame
reveal_type(some_obj)

Yields an error:

(pandas-dev) mypy test.py
test.py:3: error: Incompatible types in assignment (expression has type "None", variable has type "DataFrame")
test.py:4: note: Revealed type is 'pandas.core.frame.DataFrame'

I'm not sure the best approach here. Direct imports into pandas._typing should cause circular imports but the ABCs won't provide anything with type checking as long as they are factory generated.

jreback commented 5 years ago

what if we use the quoted imports eg ‘DataFrame’? doesn’t this achieve the same purpose?

WillAyd commented 5 years ago

In pandas._typing or in other modules?

This code:

some_obj = None  # type: 'pd.DataFrame'
reveal_type(some_obj)

Yields:

test.py:1: error: Name 'pd' is not defined
test.py:2: note: Revealed type is 'Any'
shoyer commented 5 years ago

You can avoid circular imports by guarding the imports in if TYPE_CHECKING blocks: https://mypy.readthedocs.io/en/latest/common_issues.html#import-cycles

In xarray we wrote a little stub for TYPE_CHECKING so it even works in Python < 3.5.3: https://github.com/pydata/xarray/blob/76aa4f059228befa788f07d751095452d8390426/xarray/core/pycompat.py#L16-LL17

jreback commented 5 years ago

does just ‘DataFrame’ or ‘pandas.core.frame.DataFrame’ work?

WillAyd commented 5 years ago

You can avoid circular imports by guarding the imports in if TYPE_CHECKING blocks:

Right but IIUC we would then need to use that in all modules importing pandas._typing and then use literal references from there, and I'm not sure this fully resolves circular imports when the type checking is running (could be wrong will have to investigate more)

does just ‘DataFrame’ or ‘pandas.core.frame.DataFrame’ work?

no those would yield Name (pandas|DataFrame) is not defined