Closed coroa closed 11 months ago
Can this be closed now? Or we could leave this open until df.min(), df.max() etc are tested.
No, this is still broken under pandas~=1.5, as well as pandas~=2.0. I don't think anything short of a change in the __array__
or __array_func__
implementation of pint would help here.
What versions was this last working with? I am unsure if it was a pandas change or a pint change that has caused this.
I'm leaning towards a pint change as I don't see anything obvious in pandas that would cause it. Could also trying implementing __array_func__
on PintArray
In Block.reduce, pandas 1.4.x there's a try, except clause https://github.com/pandas-dev/pandas/blob/1.4.x/pandas/core/internals/blocks.py#L413 Which is also in 1.5.x https://github.com/pandas-dev/pandas/blob/1.5.x/pandas/core/internals/blocks.py#L360 but not in 2.0. https://github.com/pandas-dev/pandas/blob/2.0.x/pandas/core/internals/blocks.py#L340
Note that it is failing outside the try-except block. result
is produced successfully (as a scalar Quantity
), but wrapping it in np.array([[result]])
then raises at https://github.com/pandas-dev/pandas/blob/778ab8234059059d77eb5d71442c959f8d15c1e0/pandas/core/internals/blocks.py#L369. I do not know how or if this worked previously.
I tried several versions which didn't work.
# {'numpy': '1.24.3', 'pandas': '1.4.0', 'pint': '0.21', 'pint_pandas': '0.4'} fails
# {'numpy': '1.24.3', 'pandas': '1.3.0', 'pint': '0.21', 'pint_pandas': '0.4'} fails
# {'numpy': '1.24.3', 'pandas': '1.4.0', 'pint': '0.19', 'pint_pandas': '0.4'} fails
# {'numpy': '1.24.3', 'pandas': '1.4.0', 'pint': '0.19', 'pint_pandas': '0.3'} fails
I note that running pd.DataFrame([[0, 1, 2]]).astype("pint[m]").sum()
works
Looks like pandas is working on this already and should be in the next version! https://github.com/pandas-dev/pandas/pull/52788
On it...this has one extraneous change (preserve dtype when converting NA to array of Quantities), but I offer these as essentially the minimal changes needed to work with upcoming Pandas 2.1. (This work work any Pandas < 2.1)
I will be bundling this with a PR that's already slated to land in Pint 0.23 (and other PRs in Pandas, Pint-Pandas, and uncertainties). But if you want to apply and test by itself...
(itr_env) iMac-Pro:pint-pandas michael$ git diff
diff --git a/pint_pandas/pint_array.py b/pint_pandas/pint_array.py
index a445b77..4605704 100644
--- a/pint_pandas/pint_array.py
+++ b/pint_pandas/pint_array.py
@@ -208,8 +208,8 @@ dtypemap = {
float: pd.Float64Dtype(),
np.float64: pd.Float64Dtype(),
np.float32: pd.Float32Dtype(),
- np.complex128: pd.core.dtypes.dtypes.PandasDtype("complex128"),
- np.complex64: pd.core.dtypes.dtypes.PandasDtype("complex64"),
+ np.complex128: pd.core.dtypes.dtypes.NumpyEADtype("complex128"),
+ np.complex64: pd.core.dtypes.dtypes.NumpyEADtype("complex64"),
# np.float16: pd.Float16Dtype(),
}
dtypeunmap = {v: k for k, v in dtypemap.items()}
@@ -520,7 +520,10 @@ class PintArray(ExtensionArray, ExtensionOpsMixin):
# magnitude is in fact an array scalar, which will get rejected by pandas.
fill_value = fill_value[()]
- result = take(data, indices, fill_value=fill_value, allow_fill=allow_fill)
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore")
+ # Turn off warning that PandasArray is deprecated for ``take``
+ result = take(data, indices, fill_value=fill_value, allow_fill=allow_fill)
return PintArray(result, dtype=self.dtype)
@@ -990,7 +993,7 @@ class PintArray(ExtensionArray, ExtensionOpsMixin):
qtys = [
self._Q(item, self._dtype.units)
if item is not self.dtype.na_value.m
- else item
+ else self.dtype.na_value
for item in self._data
]
with warnings.catch_warnings(record=True):
@@ -1048,7 +1051,7 @@ class PintArray(ExtensionArray, ExtensionOpsMixin):
value = [item.to(self.units).magnitude for item in value]
return arr.searchsorted(value, side=side, sorter=sorter)
- def _reduce(self, name, **kwds):
+ def _reduce(self, name, *, skipna: bool = True, keepdims: bool = False, **kwds):
"""
Return a scalar result of performing the reduction operation.
@@ -1092,14 +1095,18 @@ class PintArray(ExtensionArray, ExtensionOpsMixin):
if isinstance(self._data, ExtensionArray):
try:
- result = self._data._reduce(name, **kwds)
+ result = self._data._reduce(name, skipna=skipna, keepdims=keepdims, **kwds)
except NotImplementedError:
result = functions[name](self.numpy_data, **kwds)
if name in {"all", "any", "kurt", "skew"}:
return result
if name == "var":
+ if keepdims:
+ return PintArray(result, f"pint[({self.units})**2]")
return self._Q(result, self.units**2)
+ if keepdims:
+ return PintArray(result, self.dtype)
return self._Q(result, self.units)
def _accumulate(self, name: str, *, skipna: bool = True, **kwds):
diff --git a/pint_pandas/testsuite/test_issues.py b/pint_pandas/testsuite/test_issues.py
index c288e71..384f84f 100644
--- a/pint_pandas/testsuite/test_issues.py
+++ b/pint_pandas/testsuite/test_issues.py
@@ -202,3 +202,16 @@ def test_issue_139():
assert np.all(a_m[0:4] == a_cm[0:4])
for x, y in zip(a_m[4:], a_cm[4:]):
assert unp.isnan(x) == unp.isnan(y)
+
+class TestIssue174(BaseExtensionTests):
+ def test_sum(self):
+ a = pd.DataFrame([[0, 1, 2], [3, 4, 5]]).astype("pint[m]")
+ row_sum = a.sum(axis=0)
+ expected_1 = pd.Series([3, 5, 7], dtype="pint[m]")
+
+ self.assert_series_equal(row_sum, expected_1)
+
+ col_sum = a.sum(axis=1)
+ expected_2 = pd.Series([3, 12], dtype="pint[m]")
+
+ self.assert_series_equal(col_sum, expected_2)
diff --git a/pint_pandas/testsuite/test_pandas_extensiontests.py b/pint_pandas/testsuite/test_pandas_extensiontests.py
index 7c92751..a638c5b 100644
--- a/pint_pandas/testsuite/test_pandas_extensiontests.py
+++ b/pint_pandas/testsuite/test_pandas_extensiontests.py
@@ -523,7 +523,7 @@ class TestArithmeticOps(base.BaseArithmeticOpsTests):
divmod(s, other)
def _get_exception(self, data, op_name):
- if data.data.dtype == pd.core.dtypes.dtypes.PandasDtype("complex128"):
+ if data.data.dtype == pd.core.dtypes.dtypes.NumpyEADtype("complex128"):
if op_name in ["__floordiv__", "__rfloordiv__", "__mod__", "__rmod__"]:
return op_name, TypeError
if op_name in ["__pow__", "__rpow__"]:
@@ -627,6 +627,10 @@ class TestNumericReduce(base.BaseNumericReduceTests):
expected = expected_m
assert result == expected
+ @pytest.mark.skip("tests not written yet")
+ def check_reduce_frame(self, ser: pd.Series, op_name: str, skipna: bool):
+ pass
+
@pytest.mark.parametrize("skipna", [True, False])
def test_reduce_scaling(
self, data, all_numeric_reductions, skipna, USE_UNCERTAINTIES
Just a small note that https://github.com/pandas-dev/pandas/pull/52788 has been merged in pandas. I think that should help fix this issue in pint-pandas?
I agree this should be possible in v2.1 (not released yet), but pint_pandas still needs to implement a keepdims
parameter to PintArray._reduce
at your end to make it happen, though.
I could take a stab at a PR for this if I can get someone to review the PR. I assume the tests should be added to pint_pandas/testsuite/test_pandas_extensiontests.py
in e.g. a new TestReduction
class?
Hey, that was me :smile:
PR#140 has the keepdims changes. You could split them out and merge them independently. The TestReductions (note 's' at the end) does need to be incorporated. It looks like the Pandas base test case makes up its own data, and that such data is not as trivial as data
, so a TestReductions class in Pint-Pandas would take some work and would be welcome!
yea go for it. it would be good to split Michaels changes out in seperate PRs to make it easier to review.
I was holding out for a rc (which doesn't look like is happening) before taking a look as it takes ages for the CI to build pandas otherwise.
I've split out the changes here: https://github.com/hgrecco/pint-pandas/pull/196
Pandas 2.1.0rc0 is now available. Could be a good time to sync up with our own release candidates.
close by #195
pd.DataFrame([[0, 1, 2], [3, 4, 5]]).astype("pint[m]").sum()
0 3.0
1 5.0
2 7.0
dtype: pint[meter]
Amazing, thanks @topper-123 !
With the versions:
I am unable to run any dataframe-wise aggregation functions along
axis=0
:Similarly
df.min()
,df.max()
and so on are failing.This might be connected with: https://github.com/hgrecco/pint/issues/1128#issuecomment-979432968