Open szaiserb opened 1 month ago
Why isn't this supposed to be the desired behaviour? This is the way pandas
works. When you perform set_index
over a column, not only the values are used as index but also its dtype
Seeing the units in the cells mean the data is stored as an array of quantities inside the PintArray , as opposed to an array of units or floats.
This looks like one of the PintArray init paths doesn't behave as expected
When you perform
set_index
over a column, not only the values are used as index but also itsdtype
Using the column dtype for the index on .set_index()
is perfect, however my expectation is to have type(df.index[0]) = float
and df.index.dtype = pint[<unit>]
. Then, df.index
behaves largely like df[<column_name>]
. Having type(df.index[0]) = pint[<unit>]
would only be required on a mixed - type index (which I do not see any usecase for).
looks like it is a bug in pandas, index doesnt use the data's dtype's formating func https://github.com/pandas-dev/pandas/blob/3b48b17e52f3f3837b9ba8551c932f44633b5ff8/pandas/core/indexes/base.py#L1411
This is as expected:
df = df.set_index('a',drop=False)
i = df.index
i.values
<PintArray>
[1.0, 2.0]
Length: 2, dtype: pint[second]
Bug description
DataFrame.set_index puts units to dataframe index cells. I was very surprised when I found out, and I currently need to work around it. For the actual dataframe data cells this behavior is clearly not intended (quote from docs):
Minimum example
Output: