Open mgab opened 3 years ago
Hello !
I'd like to contribute by trying to fix this issue, is it possible ?
Here's another example:
import io
data = '''name,age,test1,test2,teacher
Adam,15,95.0,80,Ashby
Bob,16,81.0,82,Ashby
Dave,16,89.0,84,Jones
Fred,15,,88,Jones'''
scores = pd.read_csv(io.StringIO(data), dtype_backend='pyarrow')
(scores
.pivot(columns='teacher', values=['test1', 'test2']).dtypes
)
The types of the pivot are object...
Just wanted to check if the issue could be closed but apparently it can't. I can reproduce the behavior with Pandas 2.2.3
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[x] I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
When calling
DataFrame.pivot
with a list of column names as thevalues
argument, numeric columns are cast to a common ancestor datatype of the selected columns. So depending on the dtypes of the columns selected in thevalues
argument:Expected Behavior
As far as I understand, the dtype of pivoted columns could be maintained. If there are missing values on the pivot table, int columns might be casted to floats to allow NaNs, but otherwise no dtype transformations should occur due to pivot.
Perhaps, the behaviour is expected as if one were to do
df.stack().unstack()
. In that case, perhaps it could be included in the notes section of the documentation.Installed Versions