hgrecco / pint-pandas

Pandas support for pint
Other
172 stars 42 forks source link

new `Dimensionality error` raised when trying to apply a function on `axis=1` with 0.6.1 (did work on 0.6.0) #246

Closed kompre closed 2 months ago

kompre commented 2 months ago

With pint_pandas=0.6.1 (and pandas=2.2.2) when I try a dataframe operation across axis=1 (i.e. columns), if the units are not the same, it will raise a DimensionalityError

here a minimal work example:

import pandas as pd
import pint_pandas

df = pd.DataFrame({
    'a': [1, 2, 3],
    'b': [4, 5, 6],
    'c': [7, 8, 9],
})

print(df)

df = df.astype({
    'a': 'pint[m]',
    'b': 'pint[m/s]',
    'c': 'pint[kN]',
})

print(df.dtypes)

# now an operation where each cell is independent from each other
df.apply(lambda x: x * 2, axis=1)

output:

   a  b  c
0  1  4  7
1  2  5  8
2  3  6  9
a             pint[meter]
b    pint[meter / second]
c        pint[kilonewton]
dtype: object

{
    "name": "DimensionalityError",
    "message": "Cannot convert from 'meter / second' ([length] / [time]) to 'meter' ([length])",
    "stack": "---------------------------------------------------------------------------
DimensionalityError                       Traceback (most recent call last)
Cell In[29], line 1
----> 1 df.loc[0, 'a':'c']

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pandas\\core\\indexing.py:1184, in _LocationIndexer.__getitem__(self, key)
   1182     if self._is_scalar_access(key):
   1183         return self.obj._get_value(*key, takeable=self._takeable)
-> 1184     return self._getitem_tuple(key)
   1185 else:
   1186     # we by definition only have the 0th axis
   1187     axis = self.axis or 0

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pandas\\core\\indexing.py:1368, in _LocIndexer._getitem_tuple(self, tup)
   1366 with suppress(IndexingError):
   1367     tup = self._expand_ellipsis(tup)
-> 1368     return self._getitem_lowerdim(tup)
   1370 # no multi-index, so validate all of the indexers
   1371 tup = self._validate_tuple_indexer(tup)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pandas\\core\\indexing.py:1065, in _LocationIndexer._getitem_lowerdim(self, tup)
   1061 for i, key in enumerate(tup):
   1062     if is_label_like(key):
   1063         # We don't need to check for tuples here because those are
   1064         #  caught by the _is_nested_tuple_indexer check above.
-> 1065         section = self._getitem_axis(key, axis=i)
   1067         # We should never have a scalar section here, because
   1068         #  _getitem_lowerdim is only called after a check for
   1069         #  is_scalar_access, which that would be.
   1070         if section.ndim == self.ndim:
   1071             # we're in the middle of slicing through a MultiIndex
   1072             # revise the key wrt to `section` by inserting an _NS

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pandas\\core\\indexing.py:1431, in _LocIndexer._getitem_axis(self, key, axis)
   1429 # fall thru to straight lookup
   1430 self._validate_key(key, axis)
-> 1431 return self._get_label(key, axis=axis)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pandas\\core\\indexing.py:1381, in _LocIndexer._get_label(self, label, axis)
   1379 def _get_label(self, label, axis: AxisInt):
   1380     # GH#5567 this will fail if the label is not present in the axis.
-> 1381     return self.obj.xs(label, axis=axis)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pandas\\core\\generic.py:4321, in NDFrame.xs(self, key, axis, level, drop_level)
   4315 if self.ndim == 1:
   4316     # if we encounter an array-like and we only have 1 dim
   4317     # that means that their are list/ndarrays inside the Series!
   4318     # so just return them (GH 6394)
   4319     return self._values[loc]
-> 4321 new_mgr = self._mgr.fast_xs(loc)
   4323 result = self._constructor_sliced_from_mgr(new_mgr, axes=new_mgr.axes)
   4324 result._name = self.index[loc]

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pandas\\core\\internals\\managers.py:1006, in BlockManager.fast_xs(self, loc)
   1004 if isinstance(dtype, ExtensionDtype):
   1005     cls = dtype.construct_array_type()
-> 1006     result = cls._from_sequence(result, dtype=dtype)
   1008 bp = BlockPlacement(slice(0, len(result)))
   1009 block = new_block(result, placement=bp, ndim=1)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pint_pandas\\pint_array.py:639, in PintArray._from_sequence(cls, scalars, dtype, copy)
    635     dtype = PintType(master_scalar.units)
    637 if isinstance(master_scalar, _Quantity):
    638     scalars = [
--> 639         (item.to(dtype.units).magnitude if hasattr(item, \"to\") else item)
    640         for item in scalars
    641     ]
    642 return cls(scalars, dtype=dtype, copy=copy)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pint\\facets\\plain\\quantity.py:536, in PlainQuantity.to(self, other, *contexts, **ctx_kwargs)
    519 \"\"\"Return PlainQuantity rescaled to different units.
    520 
    521 Parameters
   (...)
    532 pint.PlainQuantity
    533 \"\"\"
    534 other = to_units_container(other, self._REGISTRY)
--> 536 magnitude = self._convert_magnitude_not_inplace(other, *contexts, **ctx_kwargs)
    538 return self.__class__(magnitude, other)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pint\\facets\\plain\\quantity.py:480, in PlainQuantity._convert_magnitude_not_inplace(self, other, *contexts, **ctx_kwargs)
    477     with self._REGISTRY.context(*contexts, **ctx_kwargs):
    478         return self._REGISTRY.convert(self._magnitude, self._units, other)
--> 480 return self._REGISTRY.convert(self._magnitude, self._units, other)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pint\\facets\\plain\\registry.py:1041, in GenericPlainRegistry.convert(self, value, src, dst, inplace)
   1038 if src == dst:
   1039     return value
-> 1041 return self._convert(value, src, dst, inplace)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pint\\facets\\context\\registry.py:405, in GenericContextRegistry._convert(self, value, src, dst, inplace)
    401             src = self._active_ctx.transform(a, b, self, src)
    403         value, src = src._magnitude, src._units
--> 405 return super()._convert(value, src, dst, inplace)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pint\\facets\
onmultiplicative\\registry.py:259, in GenericNonMultiplicativeRegistry._convert(self, value, src, dst, inplace)
    257 # convert if no offset units are present
    258 if not (src_offset_unit or dst_offset_unit):
--> 259     return super()._convert(value, src, dst, inplace)
    261 src_dim = self._get_dimensionality(src)
    262 dst_dim = self._get_dimensionality(dst)

File c:\\Users\\s.follador\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pint-pandas-bug-LBTI1Jgp-py3.12\\Lib\\site-packages\\pint\\facets\\plain\\registry.py:1076, in GenericPlainRegistry._convert(self, value, src, dst, inplace, check_dimensionality)
   1073 factor = self._get_conversion_factor(src, dst)
   1075 if isinstance(factor, DimensionalityError):
-> 1076     raise factor
   1078 # factor is type float and if our magnitude is type Decimal then
   1079 # must first convert to Decimal before we can '*' the values
   1080 if isinstance(value, Decimal):

DimensionalityError: Cannot convert from 'meter / second' ([length] / [time]) to 'meter' ([length])"
}

even something like this df.loc[0, 'a':'c'] will return the same error, but df.loc[0:1, 'a':'c'] will not.

With pint_pandas=0.6.0 this error did not occur (indeed previous ipynb works made with the previous version works for me).

Of course the example operation is very simple and would work better as axis=0, but I really need to work with axis=1 in my real application.

m-rossi commented 2 months ago

I would like to add also something simple like

df.sum(axis=0)

does not work anymore with version 0.6.1.

andrewgsavage commented 2 months ago

should work with 0.6.2, released yday

kompre commented 2 months ago

should work with 0.6.2, released yday

Indeed it does