hgrecco / pint-pandas

Pandas support for pint
Other
166 stars 40 forks source link

PintType is not mypy-friendly #213

Closed MichaelTiemannOSC closed 5 months ago

MichaelTiemannOSC commented 6 months ago

Here is some sample code that mypy cannot handle:

import pandas as pd
import typing
from pandas.api.extensions import ExtensionArray
from pint_pandas import PintType, PintArray

def co2_sum_across(df: pd.DataFrame) -> pd.Series:
    ser = df.iloc[:, 0]
    if len(df.columns) == 1:
        return ser
    col_units = ser.dtype
    assert isinstance(col_units, PintType)
    ser = ser.pint.m
    for i in range(1, len(df.columns)):
        ser = ser + df.iloc[:, i].pint.m_as(str(col_units.units))
    return typing.cast(ExtensionArray, ser).astype(str(col_units))

mypy throws the following error on the above:

% pre-commit run mypy --file foo.py                    
mypy.....................................................................Failed
- hook id: mypy
- duration: 1.85s
- exit code: 1

foo.py:14: error: "PintType" has no attribute "units"  [attr-defined]
Found 1 error in 1 file (checked 1 source file)

The particular problem is that the PintType that were are looking at here (pint[Gt CO2e]) is forcibly given a units attribute in this part of construct_from_string:

        if isinstance(string, str) and (
            string.startswith("pint[") or string.startswith("Pint[")
        ):
            # do not parse string like U as pint[U]                                                                                                                                                           
            # avoid tuple to be regarded as unit                                                                                                                                                              
            try:
                return cls(units=string)
            except ValueError:
                pass

But mypy sees no units in the class declaration. Is there a way to preview to mypy that a units property will come? Or do we just have to turn off typing when using units from a PintType?

andrewgsavage commented 6 months ago

Try adding units = None to the class definition?

I imagine there will be more mypy issues... Would be good to get that working nicely

MichaelTiemannOSC commented 6 months ago

Thanks for that suggestion. Unfortunately mypy doesn't support editable builds (https://github.com/python/mypy/issues/13392) and its doing a grand job preventing me from testing such a simple change. The line of code I'm trying to add is this:

    units: Optional[_Unit] = None  # Filled in by `construct_from_..._string`                                                                                                                                 

But editing pint_array.py in site-packages/pint_pandas is not affecting the metadata that mypy is collecting. I'm not sufficiently experienced with mypy nor python build systems to know how to get mypy to use a particular path for satisfying its module dependencies.

MichaelTiemannOSC commented 6 months ago

As an amendment to the above, I did figure out that running mypy directly, mypy preferred to look at pycache which somehow maintained an older version of the pint_array.py file I was modifying in place. However, when I ran mypy via pre-commit, the pre-comment script finds the wrong version of pint_array.py because it insists on creating its own virtual environments from the named dependencies. I overrode that, changed language: python to language: system (following advice from https://jaredkhan.com/blog/mypy-pre-commit) and that was enough to confirm that the change was typed alright.

andrewgsavage commented 5 months ago

I'll leave this issue open as I doubt that's all the mypy issues that will come up... If we could get mypy running in the CI that would help