hgrecco / pint-pandas

Pandas support for pint
Other
166 stars 41 forks source link

Fix importing with Pint 0.21 #171

Closed mikapfl closed 1 year ago

mikapfl commented 1 year ago

Hi,

this fixes importing with Pint 0.21 as discussed in #168.

Not all unit tests pass because Pint 0.21 changed other things, but at least the unit tests can be executed.

Cheers,

Mika

andrewgsavage commented 1 year ago

that's great thanks

Could you mark the failing tests as xfail with a message saying they were broken by changes in pint 0.21?

mikapfl commented 1 year ago

One test (in two configurations) is also failing with Pint 0.20.1, and seems to be related to an error within pandas when using masked arrays? See details for a traceback.

``` FAILED [ 19%] pint_pandas/testsuite/test_pandas_extensiontests.py:281 (TestGroupby.test_in_numeric_groupby[float]) values = [4294967297.0, 4294967297.0, , , 1.0, 1.0, 4294967297.0, 4294967306.0] Length: 8, dtype: pint[meter] def array_func(values: ArrayLike) -> ArrayLike: try: > result = self.grouper._cython_operation( "aggregate", values, how, axis=data.ndim - 1, min_count=min_count, **kwargs, ) venv/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:1490: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = kind = 'aggregate' values = [4294967297.0, 4294967297.0, , , 1.0, 1.0, 4294967297.0, 4294967306.0] Length: 8, dtype: pint[meter] how = 'sum', axis = 1, min_count = 0, kwargs = {} cy_op = ids = array([0, 0, 1, 1, 2, 2, 0, 3]), _ = 4, ngroups = 4 @final def _cython_operation( self, kind: str, values, how: str, axis: AxisInt, min_count: int = -1, **kwargs, ) -> ArrayLike: """ Returns the values of a cython operation. """ assert kind in ["transform", "aggregate"] cy_op = WrappedCythonOp(kind=kind, how=how, has_dropped_na=self.has_dropped_na) ids, _, _ = self.group_info ngroups = self.ngroups > return cy_op.cython_operation( values=values, axis=axis, min_count=min_count, comp_ids=ids, ngroups=ngroups, **kwargs, ) venv/lib/python3.11/site-packages/pandas/core/groupby/ops.py:959: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = values = [4294967297.0, 4294967297.0, , , 1.0, 1.0, 4294967297.0, 4294967306.0] Length: 8, dtype: pint[meter] axis = 1, min_count = 0, comp_ids = array([0, 0, 1, 1, 2, 2, 0, 3]), ngroups = 4 kwargs = {}, dtype = pint[meter], is_numeric = True @final def cython_operation( self, *, values: ArrayLike, axis: AxisInt, min_count: int = -1, comp_ids: np.ndarray, ngroups: int, **kwargs, ) -> ArrayLike: """ Call our cython function, with appropriate pre- and post- processing. """ if values.ndim > 2: raise NotImplementedError("number of dimensions is currently limited to 2") if values.ndim == 2: assert axis == 1, axis elif not is_1d_only_ea_dtype(values.dtype): # Note: it is *not* the case that axis is always 0 for 1-dim values, # as we can have 1D ExtensionArrays that we need to treat as 2D assert axis == 0 dtype = values.dtype is_numeric = is_numeric_dtype(dtype) # can we do this operation with our cython functions # if not raise NotImplementedError self._disallow_invalid_ops(dtype, is_numeric) if not isinstance(values, np.ndarray): # i.e. ExtensionArray > return self._ea_wrap_cython_operation( values, min_count=min_count, ngroups=ngroups, comp_ids=comp_ids, **kwargs, ) venv/lib/python3.11/site-packages/pandas/core/groupby/ops.py:649: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = values = [4294967297.0, 4294967297.0, , , 1.0, 1.0, 4294967297.0, 4294967306.0] Length: 8, dtype: pint[meter] min_count = 0, ngroups = 4, comp_ids = array([0, 0, 1, 1, 2, 2, 0, 3]) kwargs = {} @final def _ea_wrap_cython_operation( self, values: ExtensionArray, min_count: int, ngroups: int, comp_ids: np.ndarray, **kwargs, ) -> ArrayLike: """ If we have an ExtensionArray, unwrap, call _cython_operation, and re-wrap if appropriate. """ if isinstance(values, BaseMaskedArray): return self._masked_ea_wrap_cython_operation( values, min_count=min_count, ngroups=ngroups, comp_ids=comp_ids, **kwargs, ) elif isinstance(values, Categorical): assert self.how == "rank" # the only one implemented ATM assert values.ordered # checked earlier mask = values.isna() npvalues = values._ndarray res_values = self._cython_op_ndim_compat( npvalues, min_count=min_count, ngroups=ngroups, comp_ids=comp_ids, mask=mask, **kwargs, ) # If we ever have more than just "rank" here, we'll need to do # `if self.how in self.cast_blocklist` like we do for other dtypes. return res_values > npvalues = self._ea_to_cython_values(values) venv/lib/python3.11/site-packages/pandas/core/groupby/ops.py:365: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = values = [4294967297.0, 4294967297.0, , , 1.0, 1.0, 4294967297.0, 4294967306.0] Length: 8, dtype: pint[meter] def _ea_to_cython_values(self, values: ExtensionArray) -> np.ndarray: # GH#43682 if isinstance(values, (DatetimeArray, PeriodArray, TimedeltaArray)): # All of the functions implemented here are ordinal, so we can # operate on the tz-naive equivalents npvalues = values._ndarray.view("M8[ns]") elif isinstance(values.dtype, StringDtype): # StringArray npvalues = values.to_numpy(object, na_value=np.nan) else: > raise NotImplementedError( f"function is not implemented for this dtype: {values.dtype}" ) E NotImplementedError: function is not implemented for this dtype: pint[meter] venv/lib/python3.11/site-packages/pandas/core/groupby/ops.py:394: NotImplementedError During handling of the above exception, another exception occurred: self = data_for_grouping = [4294967297.0, 4294967297.0, , , 1.0, 1.0, 4294967297.0, 4294967306.0] Length: 8, dtype: pint[meter] def test_in_numeric_groupby(self, data_for_grouping): df = pd.DataFrame( { "A": [1, 1, 2, 2, 3, 3, 1, 4], "B": data_for_grouping, "C": [1, 1, 1, 1, 1, 1, 1, 1], } ) > result = df.groupby("A").sum().columns pint_pandas/testsuite/test_pandas_extensiontests.py:290: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ venv/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:2263: in sum result = self._agg_general( venv/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:1422: in _agg_general result = self._cython_agg_general( venv/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:1507: in _cython_agg_general new_mgr = data.grouped_reduce(array_func) venv/lib/python3.11/site-packages/pandas/core/internals/managers.py:1506: in grouped_reduce applied = blk.apply(func) venv/lib/python3.11/site-packages/pandas/core/internals/blocks.py:329: in apply result = func(self.values, **kwargs) venv/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:1503: in array_func result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt) venv/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:1457: in _agg_py_fallback res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True) venv/lib/python3.11/site-packages/pandas/core/groupby/ops.py:994: in agg_series result = self._aggregate_series_pure_python(obj, func) venv/lib/python3.11/site-packages/pandas/core/groupby/ops.py:1015: in _aggregate_series_pure_python res = func(group) <__array_function__ internals>:200: in sum ??? venv/lib/python3.11/site-packages/numpy/core/fromnumeric.py:2324: in sum return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims, venv/lib/python3.11/site-packages/numpy/core/fromnumeric.py:84: in _wrapreduction return reduction(axis=axis, out=out, **passkwargs) venv/lib/python3.11/site-packages/pandas/core/generic.py:11512: in sum return NDFrame.sum(self, axis, skipna, numeric_only, min_count, **kwargs) venv/lib/python3.11/site-packages/pandas/core/generic.py:11280: in sum return self._min_count_stat_function( venv/lib/python3.11/site-packages/pandas/core/generic.py:11263: in _min_count_stat_function return self._reduce( venv/lib/python3.11/site-packages/pandas/core/series.py:4652: in _reduce return delegate._reduce(name, skipna=skipna, **kwds) pint_pandas/pint_array.py:833: in _reduce result = functions[name](self._data, **kwds) venv/lib/python3.11/site-packages/pandas/core/nanops.py:96: in _f return f(*args, **kwargs) venv/lib/python3.11/site-packages/pandas/core/nanops.py:421: in new_func result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs) venv/lib/python3.11/site-packages/pandas/core/nanops.py:494: in newfunc return func(values, axis=axis, **kwargs) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ values = [4294967297.0, 4294967297.0, 4294967297.0] Length: 3, dtype: Float64 @disallow("M8") @_datetimelike_compat @maybe_operate_rowwise def nansum( values: np.ndarray, *, axis: AxisInt | None = None, skipna: bool = True, min_count: int = 0, mask: npt.NDArray[np.bool_] | None = None, ) -> float: """ Sum the elements along an axis ignoring NaNs Parameters ---------- values : ndarray[dtype] axis : int, optional skipna : bool, default True min_count: int, default 0 mask : ndarray[bool], optional nan-mask if known Returns ------- result : dtype Examples -------- >>> from pandas.core import nanops >>> s = pd.Series([1, 2, np.nan]) >>> nanops.nansum(s) 3.0 """ values, mask, dtype, dtype_max, _ = _get_values( values, skipna, fill_value=0, mask=mask ) dtype_sum = dtype_max if is_float_dtype(dtype): dtype_sum = dtype elif is_timedelta64_dtype(dtype): dtype_sum = np.dtype(np.float64) > the_sum = values.sum(axis, dtype=dtype_sum) E TypeError: BaseMaskedArray.sum() takes 1 positional argument but 2 were given venv/lib/python3.11/site-packages/pandas/core/nanops.py:652: TypeError ```

I'm looking into it, but I'm not sure I'll be able to understand what is happening.

mikapfl commented 1 year ago

Even this fails, also with pandas==1.5.3 and Pint==0.20.1:

In [2]: import pint_pandas

In [3]: import pandas as pd

In [4]: a = pd.Series([1.,2.,3.], dtype="pint[m]")

In [5]: a.sum()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

Traceback:

``` Cell In [5], line 1 ----> 1 a.sum() File ~/work/pint-pandas/venv/lib/python3.11/site-packages/pandas/core/generic.py:11797, in NDFrame._add_numeric_operations..sum(self, axis, skipna, level, numeric_only, min_count, **kwargs) 11777 @doc( 11778 _num_doc, 11779 desc="Return the sum of the values over the requested axis.\n\n" (...) 11795 **kwargs, 11796 ): > 11797 return NDFrame.sum( 11798 self, axis, skipna, level, numeric_only, min_count, **kwargs 11799 ) File ~/work/pint-pandas/venv/lib/python3.11/site-packages/pandas/core/generic.py:11501, in NDFrame.sum(self, axis, skipna, level, numeric_only, min_count, **kwargs) 11492 def sum( 11493 self, 11494 axis: Axis | None = None, (...) 11499 **kwargs, 11500 ): > 11501 return self._min_count_stat_function( 11502 "sum", nanops.nansum, axis, skipna, level, numeric_only, min_count, **kwargs 11503 ) File ~/work/pint-pandas/venv/lib/python3.11/site-packages/pandas/core/generic.py:11483, in NDFrame._min_count_stat_function(self, name, func, axis, skipna, level, numeric_only, min_count, **kwargs) 11467 warnings.warn( 11468 "Using the level keyword in DataFrame and Series aggregations is " 11469 "deprecated and will be removed in a future version. Use groupby " (...) 11472 stacklevel=find_stack_level(), 11473 ) 11474 return self._agg_by_level( 11475 name, 11476 axis=axis, (...) 11480 numeric_only=numeric_only, 11481 ) > 11483 return self._reduce( 11484 func, 11485 name=name, 11486 axis=axis, 11487 skipna=skipna, 11488 numeric_only=numeric_only, 11489 min_count=min_count, 11490 ) File ~/work/pint-pandas/venv/lib/python3.11/site-packages/pandas/core/series.py:4797, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds) 4793 self._get_axis_number(axis) 4795 if isinstance(delegate, ExtensionArray): 4796 # dispatch to ExtensionArray interface -> 4797 return delegate._reduce(name, skipna=skipna, **kwds) 4799 else: 4800 # dispatch to numpy arrays 4801 if numeric_only and not is_numeric_dtype(self.dtype): File ~/work/pint-pandas/venv/lib/python3.11/site-packages/pint_pandas/pint_array.py:833, in PintArray._reduce(self, name, **kwds) 830 if name not in functions: 831 raise TypeError(f"cannot perform {name} with type {self.dtype}") --> 833 result = functions[name](self._data, **kwds) 834 if name in {"all", "any", "kurt", "skew"}: 835 return result File ~/work/pint-pandas/venv/lib/python3.11/site-packages/pandas/core/nanops.py:93, in disallow.__call__.._f(*args, **kwargs) 91 try: 92 with np.errstate(invalid="ignore"): ---> 93 return f(*args, **kwargs) 94 except ValueError as e: 95 # we want to transform an object array 96 # ValueError message to the more typical TypeError 97 # e.g. this is normally a disallowed function on 98 # object arrays that contain strings 99 if is_object_dtype(args[0]): File ~/work/pint-pandas/venv/lib/python3.11/site-packages/pandas/core/nanops.py:418, in _datetimelike_compat..new_func(values, axis, skipna, mask, **kwargs) 415 if datetimelike and mask is None: 416 mask = isna(values) --> 418 result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs) 420 if datetimelike: 421 result = _wrap_results(result, orig_values.dtype, fill_value=iNaT) File ~/work/pint-pandas/venv/lib/python3.11/site-packages/pandas/core/nanops.py:491, in maybe_operate_rowwise..newfunc(values, axis, **kwargs) 488 results = [func(x, **kwargs) for x in arrs] 489 return np.array(results) --> 491 return func(values, axis=axis, **kwargs) File ~/work/pint-pandas/venv/lib/python3.11/site-packages/pandas/core/nanops.py:631, in nansum(values, axis, skipna, min_count, mask) 628 elif is_timedelta64_dtype(dtype): 629 dtype_sum = np.dtype(np.float64) --> 631 the_sum = values.sum(axis, dtype=dtype_sum) 632 the_sum = _maybe_null_out(the_sum, axis, mask, values.shape, min_count=min_count) 634 return the_sum TypeError: BaseMaskedArray.sum() takes 1 positional argument but 2 were given ```
andrewgsavage commented 1 year ago

OK I'm not sure what's causing those. The last versions working were pandas 1.5.2 and pint 0.20.1 https://github.com/hgrecco/pint-pandas/actions/runs/4913208352/jobs/8773118128?pr=173#logs

Could you xfail that test with a different message?

mikapfl commented 1 year ago

Could you xfail that test with a different message?

I'll do that, thanks for checking. I'll have time for this tomorrow, hopefully.

mikapfl commented 1 year ago

The failing tests are pretty substantial, though. Basically all arithmetic is broken.

andrewgsavage commented 1 year ago

bors r+

bors[bot] commented 1 year ago

Build failed:

andrewgsavage commented 1 year ago

bos r+