Closed remicome closed 1 year ago
PintArray now uses a pandas array to store data rather than a numpy array. Is this causing any issues other than the test errors? You could change your tests to create pandas arrays
dimensionless_series = pd.Series( [1, 1.5, 2], dtype=pd.Float64Dtype() )
On Thu, Jun 1, 2023 at 3:25 PM remicome @.***> wrote:
Description
Casting a pandas Series from float to pint quantities, then to float again using PintArray.astype, does not give back the original pandas dtype. The last series has type PandasDtype('float64') instead of float64. This is a source of unexpected test errors involving dtypes comparison. Expected behavior
If my_pint_array is a PintArray instance, then my_pint_array.astype(float) should have the same dtype as pandas' default float implementation (i.e. float64). The same applies for int Minimal reproducible example
pint_pandas.show_versions() {'numpy': '1.24.3', 'pandas': '2.0.2', 'pint': '0.19.2', 'pint_pandas': '0.4'} import pandas as pd import pint_pandas dimensionless_series = pd.Series( [1, 1.5, 2], ) quantities = dimensionless_series.astype("pint[m]") new_dimensionless_series = quantities.astype(float) pd.testing.assert_series_equal(dimensionless_series, new_dimensionless_series)
Traceback (most recent call last): File "
", line 1, in File ".venv/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 931, in assert_series_equal assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}") File ".venv/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 415, in assert_attr_equal raise_assert_detail(obj, msg, left_attr, right_attr) File ".venv/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 599, in raise_assert_detail raise AssertionError(msg) AssertionError: Attributes of Series are different Attribute "dtype" are different
Notes
- The same code does not raise any error with pint-pandas==0.2.
Calling astype twice solves the problem:
new_dimensionless_series = quantities.astype(float).astype(float) new_dimensionless_series.dtype dtype('float64')
— Reply to this email directly, view it on GitHub https://github.com/hgrecco/pint-pandas/issues/183, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEMLEBXY3FS2ST766VER7TXJCQ57ANCNFSM6AAAAAAYXAKONA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Yes you're right — it is probably better for me to use pandas' dtypes. This hasn't caused any other error on my end, so I think this issue can be closed. Thank you for the input!
For the record, I'm now looking hard at the implications of this issue as it relates to uncertainties (https://github.com/hgrecco/pint-pandas/pull/140). The uncertainties package is very NumPy-centric, and the new PintArray astype
behavior pulls strongly in the Pandas direction. A very small aspect of this tension is discussed by Pandas people here: https://github.com/pandas-dev/pandas/issues/48891 and here: https://github.com/pandas-dev/pandas/issues/22791. The larger context here: https://github.com/pandas-dev/pandas/issues/32265
I'm pretty sure that if I can find the correct dividing line, the problems will separate neatly and both Pint and uncertainties will behave. But I'm still trying to find that line. Any guidance in the comments would be helpful!
Description
Casting a pandas Series from float to pint quantities, then to float again using
PintArray.astype
, does not give back the original pandas dtype. The last series has typePandasDtype('float64')
instead offloat64
. This is a source of unexpected test errors involving dtypes comparison.Expected behavior
If
my_pint_array
is aPintArray
instance, thenmy_pint_array.astype(float)
should have the same dtype as pandas' defaultfloat
implementation (i.e.float64
). The same applies forint
Minimal reproducible example
Notes
pint-pandas==0.2
.Calling
astype
twice solves the problem: