hgrecco / pint-pandas

Pandas support for pint
Other
166 stars 40 forks source link

series.astype(float) not returning series of floats #203

Closed rwijtvliet closed 10 months ago

rwijtvliet commented 10 months ago

Start: series with dimensionless values

import pandas as pd
import pint
import pint_pandas

s0 = pd.Series([1.0, 2.5])
s1 = s0.astype('pint[dimensionless]')

s1
# 0    1.0
# 1    2.5
# dtype: pint[]

Action causing issue: conversion back to floats

s2 = s1.astype(float)

Issue 1: changed dtype

Previously, the resulting Series had a float data type. Now, the datatype is slightly different, which causes the comparison with s0 to fail.

s2.dtype
# PandasDtype('float64')

pd.testing.assert_series_equal(s0, s2)  #"dtypes are different"

Issue 2: conversion to string

Another issue is, that the __repr__ and __str__ methods of s2 fail:

str(s2)
#AttributeError: 'numpy.ndarray' object has no attribute '_formatter'

Excuse me as I currently cannot test this with the latest pandas version; I've used 2.0 instead. pint-pandas 0.5 , and pint 0.22.

andrewgsavage commented 10 months ago

Yea that doesn't seem right and could be better.

Is this something you've noticed since pint-pandas 0.5?

andrewgsavage commented 10 months ago

https://github.com/hgrecco/pint-pandas/blob/ef8a1209699d4533299303b800982578e8322242/pint_pandas/pint_array.py#L420 return pd.array(self.quantity, dtype, copy) should be something like return np.array(self.quantity, dtype, copy) or return pd.array(self.quantity.m, dtype, copy)

rwijtvliet commented 10 months ago

Yea that doesn't seem right and could be better.

Is this something you've noticed since pint-pandas 0.5?

Thanks for your fast reply. I'm not sure if it's because I upgraded pandas or pint-pandas - if that is useful information let me know and I'll investigate.

My workaround is to use s1.pint.m if the dtype is pint[]

andrewgsavage commented 10 months ago

there's a fix, would be great to have someone look over it before meging.

MichaelTiemannOSC commented 10 months ago

I think I just ran into this issue in some of my code. Taking a look now...

MichaelTiemannOSC commented 10 months ago

I was working around a related problem, which is that pint arrays don't support cumprod for pint[dimensionless]. Here's an accumulate function that does:

    def _accumulate(self, name: str, *, skipna: bool = True, **kwds):
        if name == "cumprod" and self.dtype!="pint[dimensionless]":
            raise TypeError("cumprod not supported for pint arrays")
        functions = {
            "cummin": np.minimum.accumulate,
        "cummax": np.maximum.accumulate,
            "cumsum": np.cumsum,
            "cumprod": np.cumprod,
    }

    if isinstance(self._data, ExtensionArray):
            try:
                result = self._data._accumulate(name, **kwds)
            except NotImplementedError:
                result = functions[name](self.numpy_data, **kwds)

        return self._from_sequence(result, self.units)

However, my workaround was to use pint.dequantify() / pint.quantify(), not astype("float") so no triggers. I need to round-trip through dequantify/quantify as I don't want to expose uncertainties to certain disaster in a "float" conversion.