hgrecco / pint-pandas

Pandas support for pint
Other
172 stars 42 forks source link

PintArray.astype does not use the same conventions as pandas for float and int dtypes #183

Closed remicome closed 1 year ago

remicome commented 1 year ago

Description

Casting a pandas Series from float to pint quantities, then to float again using PintArray.astype, does not give back the original pandas dtype. The last series has type PandasDtype('float64') instead of float64. This is a source of unexpected test errors involving dtypes comparison.

Expected behavior

If my_pint_array is a PintArray instance, then my_pint_array.astype(float) should have the same dtype as pandas' default float implementation (i.e. float64). The same applies for int

Minimal reproducible example

>>> pint_pandas.show_versions()
{'numpy': '1.24.3', 'pandas': '2.0.2', 'pint': '0.19.2', 'pint_pandas': '0.4'}
>>> import pandas as pd
>>> import pint_pandas
>>> dimensionless_series = pd.Series(
    [1, 1.5, 2],
)
>>> quantities = dimensionless_series.astype("pint[m]")
>>> new_dimensionless_series = quantities.astype(float)
>>> pd.testing.assert_series_equal(dimensionless_series, new_dimensionless_series)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".venv/lib/python3.9/site-packages/pandas/_testing/asserters.py", 
line 931, in assert_series_equal
    assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
  File ".venv/lib/python3.9/site-packages/pandas/_testing/asserters.py", 
line 415, in assert_attr_equal
    raise_assert_detail(obj, msg, left_attr, right_attr)
  File ".venv/lib/python3.9/site-packages/pandas/_testing/asserters.py", 
line 599, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Attributes of Series are different

Attribute "dtype" are different
[left]:  float64
[right]: PandasDtype('float64')

Notes

andrewgsavage commented 1 year ago

PintArray now uses a pandas array to store data rather than a numpy array. Is this causing any issues other than the test errors? You could change your tests to create pandas arrays

dimensionless_series = pd.Series( [1, 1.5, 2], dtype=pd.Float64Dtype() )

On Thu, Jun 1, 2023 at 3:25 PM remicome @.***> wrote:

Description

Casting a pandas Series from float to pint quantities, then to float again using PintArray.astype, does not give back the original pandas dtype. The last series has type PandasDtype('float64') instead of float64. This is a source of unexpected test errors involving dtypes comparison. Expected behavior

If my_pint_array is a PintArray instance, then my_pint_array.astype(float) should have the same dtype as pandas' default float implementation (i.e. float64). The same applies for int Minimal reproducible example

pint_pandas.show_versions() {'numpy': '1.24.3', 'pandas': '2.0.2', 'pint': '0.19.2', 'pint_pandas': '0.4'} import pandas as pd import pint_pandas dimensionless_series = pd.Series( [1, 1.5, 2], ) quantities = dimensionless_series.astype("pint[m]") new_dimensionless_series = quantities.astype(float) pd.testing.assert_series_equal(dimensionless_series, new_dimensionless_series)

Traceback (most recent call last): File "", line 1, in File ".venv/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 931, in assert_series_equal assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}") File ".venv/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 415, in assert_attr_equal raise_assert_detail(obj, msg, left_attr, right_attr) File ".venv/lib/python3.9/site-packages/pandas/_testing/asserters.py", line 599, in raise_assert_detail raise AssertionError(msg) AssertionError: Attributes of Series are different

Attribute "dtype" are different

Notes

  • The same code does not raise any error with pint-pandas==0.2.
  • Calling astype twice solves the problem:

    new_dimensionless_series = quantities.astype(float).astype(float) new_dimensionless_series.dtype dtype('float64')

— Reply to this email directly, view it on GitHub https://github.com/hgrecco/pint-pandas/issues/183, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEMLEBXY3FS2ST766VER7TXJCQ57ANCNFSM6AAAAAAYXAKONA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

remicome commented 1 year ago

Yes you're right — it is probably better for me to use pandas' dtypes. This hasn't caused any other error on my end, so I think this issue can be closed. Thank you for the input!

MichaelTiemannOSC commented 1 year ago

For the record, I'm now looking hard at the implications of this issue as it relates to uncertainties (https://github.com/hgrecco/pint-pandas/pull/140). The uncertainties package is very NumPy-centric, and the new PintArray astype behavior pulls strongly in the Pandas direction. A very small aspect of this tension is discussed by Pandas people here: https://github.com/pandas-dev/pandas/issues/48891 and here: https://github.com/pandas-dev/pandas/issues/22791. The larger context here: https://github.com/pandas-dev/pandas/issues/32265

I'm pretty sure that if I can find the correct dividing line, the problems will separate neatly and both Pint and uncertainties will behave. But I'm still trying to find that line. Any guidance in the comments would be helpful!