pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.7k stars 17.93k forks source link

[TYPING] pandas.io.formats.format._binify annotation is inconsistent with DataFrameFormatter.__init__ #28843

Closed zero323 closed 5 years ago

zero323 commented 5 years ago

Problem description

pd.io.formats.format._binify defines line_width as Union[np.int32, int]:

https://github.com/pandas-dev/pandas/blob/39602e7d7c5b663696f5a6eca9e66e65483bc868/pandas/io/formats/format.py#L1893

however it is inconsistent with DataFrameFormatter.__init__

https://github.com/pandas-dev/pandas/blob/39602e7d7c5b663696f5a6eca9e66e65483bc868/pandas/io/formats/format.py#L518

and DataFrameFormatter._join_multiline

https://github.com/pandas-dev/pandas/blob/39602e7d7c5b663696f5a6eca9e66e65483bc868/pandas/io/formats/format.py#L871

which take and pass as-is Optional[int].

This leads to Mypy error:

pandas/io/formats/format.py:871: error: Argument 2 to "_binify" has incompatible type "Optional[int]"; expected "Union[int32, int]"

Providing default in _join_multiline, i.e.

lwidth = self.line_width or DEFAULT_LINE_WIDTH

could be reasonable fix. One could also ignore the call:

col_bins = _binify(col_widths, lwidth)  # type: ignore

but that leaves possible correctness problem.

Expected Output

Passing Mypy type check

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : bee17d5f4c99e6d1e451cf9a7b6a1780aa25988d
python           : 3.7.3.final.0
...

Additionally

mypy==0.730

CC @WillAyd

WillAyd commented 5 years ago

Cool thanks for taking a look! @simonjayhawkins as well

Does the function actually fail if line_width is None?

simonjayhawkins commented 5 years ago

Does the function actually fail if line_width is None?

if i recall these types were from using MonkeyType. so if line_width is typed as Union[np.int32, int], then we have no tests that pass None.

zero323 commented 5 years ago

Does the function actually fail if line_width is None?

Haven't tested it, but I think it is safe to safe that it does fail on >

https://github.com/pandas-dev/pandas/blob/39602e7d7c5b663696f5a6eca9e66e65483bc868/pandas/io/formats/format.py#L1902

and

https://github.com/pandas-dev/pandas/blob/39602e7d7c5b663696f5a6eca9e66e65483bc868/pandas/io/formats/format.py#L1904

WillAyd commented 5 years ago

Does np.int32 actually resolve to anything? The following yield Any for me:

import numpy as np
a_np_int = np.int32(32)
reveal_type(a_np_int)

Wonder if we shouldn't just defer dealing with numpy types until they are actually annotated. Could just simplify this signature to be int for now

simonjayhawkins commented 5 years ago

@WillAyd that's probably unrelated, but yes, all np.int to be replaced with python int is subsequent pass.

simonjayhawkins commented 5 years ago

i'm using 0.730, but not getting that particular mypy error show up yet on my branch.

since there are many mypy errors, probably best to determine if it is an actual bug in the code, since we have mypy on the ci, and for the current options and version we have a passing build.

typing errors will be resolved as more types are added and more strictness flags are enabled.

zero323 commented 5 years ago

i'm using 0.730, but not getting that particular mypy error show up yet on my branch.

Interesting... I downgraded Mypy to 0.730 and still get this one. Do you use any specific flags?

simonjayhawkins commented 5 years ago

Do you use any specific flags?

see #28339

simonjayhawkins commented 5 years ago

changing to def _binify(cols: List[int], line_width: int) -> List[int]:

and I get pandas\io\formats\format.py:873:40: error: Argument 2 to "_binify" has incompatible type "Optional[int]"; expected "int" [arg-type]

so the Any is swallowing the None.

@zero323 do you have numpy stubs in your environment?

simonjayhawkins commented 5 years ago

@WillAyd that's probably unrelated, but yes, all np.int to be replaced with python int is subsequent pass.

i was wrong.

zero323 commented 5 years ago

@zero323 do you have numpy stubs in your environment?

Yes, I do.

Edit:

And once removed this error disappears hidden by some missing import.

simonjayhawkins commented 5 years ago

so I guess that adding stubs for all the missing imports becomes more important with PEP 561 compatibility since other users could have stubs for the other packages.

zero323 commented 5 years ago

From that point of view the problem with PEP 561 is that:

Stubs or Python source manually put in the beginning of the path. Type checkers SHOULD provide this to allow the user complete control of which stubs to use, and to patch broken stubs/inline types from packages. In mypy the $MYPYPATH environment variable can be used for this.

So effective outcome is rather hard to predict.

Not to mention that all signs show that mature stubs for scientific stack are nowhere near.

One possible workaround is to keep dynamic mock stubs that cover the part of the 3rd party API that you're interested in, and use these for internal testing. I had some luck with such approach in pyspark-stubs (mocking Pandas), but exposure there is relatively small.

One way or another ignoring missing imports and type: ignore are nuclear options and it is really hard to evaluate their real impact.