hgrecco / pint-pandas

Pandas support for pint
Other
169 stars 42 forks source link

Compatibility with 'uncertainties'? #124

Open Cs137 opened 2 years ago

Cs137 commented 2 years ago

Many thanks for your work on this package and especially for pint itself! It makes life much easier, and in order to make life more easy, I would like to request a feature.

In my opinion the usage of the "pint" dtype for a column containing an uarray would be very handy, but there seems to be an issue with the __init__ method of PintArray. If I get it correctly, it creates a np.array from the provided data and thus wants to convert the uarray into floats, which causes the following error (I shortened the path and inserted [...]):

File [...]/python3.10/site-packages/pint_pandas/pint_array.py:198, in PintArray.__init__(self, values, dtype, copy)
    193     data_dtype = next(x for x in values if not isinstance(x, float))
    194     warnings.warn(
    195         f"pint-pandas does not support magnitudes of {type(data_dtype)}. Converting magnitudes to float.",
    196         category=RuntimeWarning,
    197     )
--> 198 self._data = np.array(values, float, copy=copy)
    199 self._Q = self.dtype.ureg.Quantity

File [...]/python3.10/site-packages/uncertainties/core.py:2712, in add_operators_to_AffineScalarFunc.<locals>.raise_error(self)
   2711 def raise_error(self):
-> 2712     raise TypeError("can't convert an affine function (%s)"
   2713                     ' to %s; use x.nominal_value'
   2714                     # In case AffineScalarFunc is sub-classed:
   2715                     % (self.__class__, coercion_type))

TypeError: can't convert an affine function (<class 'uncertainties.core.Variable'>) to float; use x.nominal_value

Here some snippets to reproduce the case:

import pandas as pd
import pint
import pint_pandas
from uncertainties import ufloat_fromstr, unumpy

data = {'a': {0: 0.01, 1: 0.28, 2: 0.33, 3: 0.78}, 
        'b': {0: '0.79+/-0.08', 1: '0.340+/-0.030', 
              2: '0.52+/-0.05', 3: '0.250+/-0.020'}}

df = pd.DataFrame(data)
df.b = [ufloat_fromstr(x) for x in df.b]

# dtype assignment works fine for float column
df.a = df.a.astype('pint[g]')

# dtype assignment to ufloat column causes the issue
df.b = df.b.astype('pint[g]')
# uarray creation before dtype assignment leads to same issue
n = [x.n for x in df.b]
s = [x.s for x in df.b]
ua = unumpy.uarray(n, s)

df.b = pd.Series(ua, dtype='pint[g]')
# and the usage of the PintArray constructor directly as well (as expected)
df.b = pint_pandas.PintArray(ua, 'g')

A compatibility between PintArray and uncertainties, as fulfilled by the Measurement class in pint would in my opinion be a huge improvement in pint-pandas.

andrewgsavage commented 2 years ago

Yes, this would be nice.

@hgrecco has been wanting to separate out Measurement from pint, which might mean you want a separate package for uncertainties with pandas. Probably worth waiting till that's done too - although that's been on the agenda for a while now.

MichaelTiemannOSC commented 1 year ago

xref: https://github.com/hgrecco/pint-pandas/pull/140