pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.19k stars 17.77k forks source link

BUG: dataframe.pivot() #38675

Closed jiahe224 closed 3 years ago

jiahe224 commented 3 years ago

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

# official doc, https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html

# first code with a bug

df = pd.DataFrame({
       "lev1": [1, 1, 1, 2, 2, 2],
       "lev2": [1, 1, 2, 1, 1, 2],
       "lev3": [1, 2, 1, 2, 1, 2],
       "lev4": [1, 2, 3, 4, 5, 6],
       "values": [0, 1, 2, 3, 4, 5]})
df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values")
# ValueError: Length of passed values is 6, index implies 2.

# second code with a bug

>>> month_stats_melt.columns
[Out] Index(['stockcode', 'ym', 'variable', 'value'], dtype='object')

>>> month_stats_melt.head()
[Out] 
    stockcode   ym  variable    value
0   70001   2020/01 月初占比    0.078306
1   70001   2020/02 月初占比    0.082309
2   70001   2020/03 月初占比    0.083712
3   70001   2020/04 月初占比    0.086247
4   70001   2020/05 月初占比    0.087852

>>> month_stats_melt.pivot(index='stockcode',columns=['ym', 'variable'],values='value')
# KeyError: 'Level ym not found'

Problem description

first bug

ValueError Traceback (most recent call last)

in 5 "lev4": [1, 2, 3, 4, 5, 6], 6 "values": [0, 1, 2, 3, 4, 5]}) ----> 7 df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values") ~\anaconda3\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values) 5921 from pandas.core.reshape.pivot import pivot 5922 -> 5923 return pivot(self, index=index, columns=columns, values=values) 5924 5925 _shared_docs[ ~\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in pivot(data, index, columns, values) 447 ) 448 else: --> 449 indexed = data._constructor_sliced(data[values].values, index=index) 450 return indexed.unstack(columns) 451 ~\anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath) 289 try: 290 if len(index) != len(data): --> 291 raise ValueError( 292 f"Length of passed values is {len(data)}, " 293 f"index implies {len(index)}." ValueError: Length of passed values is 6, index implies 2. # second bug ValueError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandas\core\indexes\multi.py in _get_level_number(self, level) 1294 try: -> 1295 level = self.names.index(level) 1296 except ValueError: ValueError: 'ym' is not in list During handling of the above exception, another exception occurred: KeyError Traceback (most recent call last) in ----> 1 month_stats_pivot = month_stats_melt.pivot(index='stockcode',columns=['ym', 'variable'],values='value') 2 month_stats_pivot ~\anaconda3\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values) 5921 from pandas.core.reshape.pivot import pivot 5922 -> 5923 return pivot(self, index=index, columns=columns, values=values) 5924 5925 _shared_docs[ ~\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in pivot(data, index, columns, values) 448 else: 449 indexed = data._constructor_sliced(data[values].values, index=index) --> 450 return indexed.unstack(columns) 451 452 ~\anaconda3\lib\site-packages\pandas\core\series.py in unstack(self, level, fill_value) 3548 from pandas.core.reshape.reshape import unstack 3549 -> 3550 return unstack(self, level, fill_value) 3551 3552 # ---------------------------------------------------------------------- ~\anaconda3\lib\site-packages\pandas\core\reshape\reshape.py in unstack(obj, level, fill_value) 396 # _unstack_multiple only handles MultiIndexes, 397 # and isn't needed for a single level --> 398 return _unstack_multiple(obj, level, fill_value=fill_value) 399 else: 400 level = level[0] ~\anaconda3\lib\site-packages\pandas\core\reshape\reshape.py in _unstack_multiple(data, clocs, fill_value) 318 index = data.index 319 --> 320 clocs = [index._get_level_number(i) for i in clocs] 321 322 rlocs = [i for i in range(index.nlevels) if i not in clocs] ~\anaconda3\lib\site-packages\pandas\core\reshape\reshape.py in (.0) 318 index = data.index 319 --> 320 clocs = [index._get_level_number(i) for i in clocs] 321 322 rlocs = [i for i in range(index.nlevels) if i not in clocs] ~\anaconda3\lib\site-packages\pandas\core\indexes\multi.py in _get_level_number(self, level) 1296 except ValueError: 1297 if not is_integer(level): -> 1298 raise KeyError(f"Level {level} not found") 1299 elif level < 0: 1300 level += self.nlevels KeyError: 'Level ym not found' #### Expected Output # first df.pivot(index=["lev1", "lev2"], columns=["lev3"],values="values") lev3 1 2 lev1 lev2 1 1 0.0 1.0 2 2.0 NaN 2 1 4.0 3.0 2 NaN 5.0 #### Output of ``pd.show_versions()``
INSTALLED VERSIONS ------------------ commit : None python : 3.8.3.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : Chinese (Simplified)_China.936 pandas : 1.0.5 numpy : 1.18.5 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 49.2.0.post20200714 Cython : 0.29.21 pytest : 5.4.3 hypothesis : None sphinx : 3.1.2 blosc : None feather : None xlsxwriter : 1.2.9 lxml.etree : 4.5.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : 7.16.1 pandas_datareader: None bs4 : 4.9.1 bottleneck : 1.3.2 fastparquet : None gcsfs : None lxml.etree : 4.5.2 matplotlib : 3.2.2 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.4 pandas_gbq : None pyarrow : None pytables : None pytest : 5.4.3 pyxlsb : None s3fs : None scipy : 1.5.0 sqlalchemy : 1.3.18 tables : 3.6.1 tabulate : None xarray : None xlrd : 1.2.0 xlwt : 1.3.0 xlsxwriter : 1.2.9 numba : 0.50.1
jreback commented 3 years ago

stable is 1.1.5 are you sure you are running on the right version?

jiahe224 commented 3 years ago

I'm sorry, I updated pandas with conda command before, but when I checked it, I found that it didn't update successfully, but it ran successfully after updating @jreback

chengmenlixue commented 7 months ago

It can be replaced with pivot_table