pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.26k stars 17.79k forks source link

BUG: DataFrame.stack does not work when columns includes tuple level #59697

Open or-jether opened 1 week ago

or-jether commented 1 week ago

Pandas version checks

Reproducible Example

df = pd.DataFrame(
    index=[1, 2],
    columns=pd.MultiIndex.from_tuples(
        [((1, 2), 'a'), ((3, 4), 'b')], names=['level_0', 'level_1']
    ),
    data=[[1, 2], [3, 4]],
)
df.stack() # This raises ValueError: Names should be list-like for a MultiIndex

Issue Description

Using DataFrame.stack on a dataframe that includes tuple in one of the column levels (stack the non-tuple level) results in an exception. This seems to only happen if the level has a name.

Expected Behavior

We should get something like this: df.T.unstack().T

     (1, 2)  (3, 4)
1 a     1.0     NaN
  b     NaN     2.0
2 a     3.0     NaN
  b     NaN     4.0

Installed Versions

INSTALLED VERSIONS ------------------ commit : 37ea63d540fd27274cad6585082c91b1283f963d python : 3.10.12.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 2.0.1 numpy : 1.24.3 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 57.5.0 pip : 22.3.1 Cython : 3.0.8 pytest : 7.3.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.1.2 lxml.etree : 5.1.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.15.0 pandas_datareader: None bs4 : 4.12.2 bottleneck : 1.3.7 brotli : None fastparquet : None fsspec : 2023.9.0 gcsfs : 2023.9.0 matplotlib : 3.7.2 numba : 0.59.0 numexpr : 2.8.4 odfpy : None openpyxl : 3.1.2 pandas_gbq : 0.19.2 pyarrow : 12.0.1 pyreadstat : None pyxlsb : 1.0.10 s3fs : None scipy : 1.10.1 snappy : 0.7.1 sqlalchemy : None tables : 3.8.0 tabulate : None xarray : 2023.8.0 xlrd : 2.0.1 zstandard : None tzdata : 2023.4 qtpy : None pyqt5 : None
rhshadrach commented 1 week ago

I cannot reproduce on main - @or-jether can you try using the most recent version of pandas (2.2.2) with future_stack=True.