hgrecco / pint-pandas

Pandas support for pint
Other
172 stars 42 forks source link

Unexpected result when using `MultiIndex` columns #111

Closed rwijtvliet closed 1 year ago

rwijtvliet commented 2 years ago
import pint
import pint_pandas
import pandas as pd
import numpy as np

s = pd.Series(np.random.rand(2), dtype="pint[J]")

df_good = pd.DataFrame()
df_good["energy"] = s

df_bad = pd.DataFrame(columns=[[], []])
df_bad[("toy", "energy")] = s

When drilling down into df_bad, I'd expect df_bad.toy to be the exact same dataframe as df_good. However, where df_good has a single column of floats, with the unit saved once for the entire column...

>>> df_good

                energy
0   0.4160504672851326
1  0.23998365954019407

>>> df_good.pint.dequantify()

        energy
unit     joule
0     0.416050
1     0.239984

...df_bad.toy has a column of (Quantity) objects, each with their unit included:

>>> df_bad.toy
                      energy
0   0.4160504672851326 joule
1  0.23998365954019407 joule

>>> df_bad.toy.pint.dequantify()

AttributeError: 'numpy.ndarray' object has no attribute 'units'

If this is expected/wanted/by design and I'm just using it wrong, let me know.

andrewgsavage commented 2 years ago

This is unexpected. I don't think there's a workaround.

andrewgsavage commented 1 year ago

This now works as expected:


>>> df_bad.toy
energy
0   0.9450387373783038
1   0.43155204200882635

>>>df_bad.dtypes
toy  energy    pint[joule]
dtype: object