pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.71k stars 17.93k forks source link

ENH: support slice(None) in list of labels with multiindexing #47791

Open Ynjxsjmh opened 2 years ago

Ynjxsjmh commented 2 years ago

Pandas version checks

Reproducible Example

import numpy as np
import pandas as pd

np.random.seed(123)
iterables = [["bar", "baz", "foo", "qux"], ["one", "two"]]
idx = pd.MultiIndex.from_product(iterables, names=["first", "second"])
df = pd.DataFrame(np.random.randn(4, 8), columns=idx)

out = df.loc[:, [(slice(None), "one")]]

Issue Description

From Cross-section | MultiIndex / advanced indexing, it's ok to pass a slice(None)

df.loc[(slice(None), "one"), :]

From Advanced indexing with hierarchical index | MultiIndex / advanced indexing, it's ok to pass a list of tuples

Passing a list of labels or tuples works similar to reindexing:

df.loc[[("bar", "two"), ("qux", "one")]]

However, the error raise when using slice(None) in list like df.loc[:, [(slice(None), "one")]]

Expected Behavior

The output show behave like df.loc[:, (slice(None), "one")]

Installed Versions

Pandas version: 1.4.2
phofl commented 2 years ago

Hi, thanks for your report.

I am not sure if this is desireable or not. My interpretation was, that a list of tuples refers to exact values, and the tuples itself do not contain lists or slices. Can you provide an example where this is necessary and can not be achieved with another method?

Ynjxsjmh commented 2 years ago

@phofl Thanks for your reply, I think your interpretation is reasonable, but there is usecase that we can use

out = df.loc[:, [(slice(None), "one"), ("qux", "two")]]

instead of writing all first levels manually

out = df.loc[:, [("baz", "one"), ("bar", "one"), ..., ("qux", "two")]]
phofl commented 2 years ago

Thx. Makes sense.

I think this is non trivial to do, especially without performance penalties

Ynjxsjmh commented 2 years ago

@phofl My intuition is that doing an extra check for slice(None) in tuple when indexing with list of labels. Do you mean the time cost for that will be expensive?

phofl commented 2 years ago

Currently, we pass the list of tuples into get_indexer that does not support slices. The list is not validated at all, hence I think this might be tricky.