pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.91k stars 18.03k forks source link

BUG: DataFrame.xs multi-index drop_level=False has no effect when level= is left at default #59098

Open gwerbin opened 5 months ago

gwerbin commented 5 months ago

Pandas version checks

Reproducible Example

import pandas as pd

df = pd.DataFrame(dict(i=[1,2,3], j=[1,1,2], x=[10, 100, 1000])).set_index(["i", "j"])

key = (1, 1)

# Returns DataFrame as expected:
result1 = df.xs(key, drop_level=False, level=list(range(len(key))))

# Returns Series, but DataFrame was expected:
result2 = df.xs(key, drop_level=False)

Issue Description

The drop_level=False option in DataFrame.xs apparently has no effect when level= is left at its default option. The documentation states that level= being unset should be equivalent to something like level=level=list(range(len(key))) when key is a non-string sequence.

Expected Behavior

I expected drop_level=False to have the same effect regardless of whether level= was specified.

Installed Versions

Output from pd.show_versions() ```none INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.9.final.0 python-bits : 64 OS : Darwin OS-release : 23.4.0 Version : Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0 setuptools : 69.5.1 pip : 24.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.25.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : 1.4.0 dataframe-api-compat : None fastparquet : None fsspec : 2023.12.2 gcsfs : None matplotlib : 3.8.4 numba : 0.59.1 numexpr : 2.10.0 odfpy : None openpyxl : 3.1.4 pandas_gbq : None pyarrow : 15.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : 2023.12.2 scipy : 1.13.1 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : 2024.6.0 xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None ```
ssf0409 commented 4 months ago

take

rhshadrach commented 3 weeks ago

Thanks for the report! Confirmed on main - when drop_level=False we should be returning a DataFrame in all cases (as otherwise the index levels are indeed dropped). Further investigations and PRs to fix are welcome!

ssf0409 commented 2 weeks ago

take