hgrecco / pint-pandas

Pandas support for pint
Other
169 stars 42 forks source link

interpolate(method='linear') bug in pandas or pint-pandas? #112

Open MichaelTiemannOSC opened 2 years ago

MichaelTiemannOSC commented 2 years ago

I added a comment to this Issue in pandas: https://github.com/pandas-dev/pandas/issues/41565

But now I wonder whether it's a Pandas problem (ExtensionArrays implementation of interpolate) or or a Pint-Pandas problem (lack of PintArray implementation of interpolate).

Here's interpolate working as expecting, with float64 as the base type:

>>> import pandas as pd
>>> s = pd.Series([1, None, 3], dtype=float)
>>> s
0    1.0
1    NaN
2    3.0
dtype: float64
>>> s.interpolate(method="linear")
0    1.0
1    2.0
2    3.0
dtype: float64
>>> 

Here's it not working with PintArray:

>>> import pandas as pd
>>> import pint_pandas
>>> from pint_pandas import PintArray as PA_
>>> s = pd.Series(PA_([1., None, 3.], dtype='pint[m]'), dtype='pint[m]')
>>> s
0    1.0
1    nan
2    3.0
dtype: pint[meter]
>>> s.interpolate(method="linear")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/series.py", line 5423, in interpolate
    return super().interpolate(
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/generic.py", line 6899, in interpolate
    new_data = obj._mgr.interpolate(
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 377, in interpolate
    return self.apply("interpolate", **kwargs)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1369, in interpolate
    new_values = values.fillna(value=fill_value, method=method, limit=limit)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/arrays/base.py", line 716, in fillna
    value, method = validate_fillna_kwargs(value, method)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/util/_validators.py", line 372, in validate_fillna_kwargs
    method = clean_fill_method(method)
  File "/opt/miniconda3/envs/spyder-env/lib/python3.9/site-packages/pandas/core/missing.py", line 120, in clean_fill_method
    raise ValueError(f"Invalid fill method. Expecting {expecting}. Got {method}")
ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

The working interpolate function comes from Block.interpolate (in blocks.py) and uses this try clause to muscle through:

        try:
            m = missing.clean_fill_method(method)
        except ValueError:
            m = None

The non-working interpolate function comes from EABackedBlock.interpolate (also in blocks.py), which is just a weak interface to fillna.

Is it Pandas or Pint-Pandas that needs to implement a linear method for the ExtensionArray that is a PintArray?

andrewgsavage commented 2 years ago

I've not seen any pandas interface tests about interpolation so I think it's an issue to raise in pandas.

burnpanck commented 2 years ago

From what I gather from https://github.com/pandas-dev/pandas/issues/25508 (a similar issue for timezone-aware columns), the intention is for the base ExtensionBlock.interpolate is supposed to handle that (it definitely shouldn't just forward to fillna but at least throw NotImplementedError). However, it seems that there are open questions around the API between extension types and the base implementation to support interpolate. It looks like in the datetime-column case, workarounds are being used at the extension type level. Pint-pandas could do that too.

jbrockmendel commented 1 year ago

This was an issue in pandas. An interpolate methods has been added to EAs that you can implement. There isn't much by way of testing though.