Open arengel opened 1 year ago
I love this feature request, it fits a use case I run into all the time with "internet of things" style time series data. In those cases, the primary key of the dataset is something like ("device_id", "sensor_type", "timestamp") and setting a multiindex with those fields is useful. But I often want to select a specific subset of sensors specified by (device_id, sensor_type) pairs while keeping the full timeseries for each one.
I've never found a good way of doing that with a multiindex, though maybe (hopefully!) I'm simply not aware of the right syntax. I instead resort to either concatenating keys together or constructing boolean series, which a) feel a little verbose and cumbersome and b) sacrifice performance due to their inability to use existing indices.
One option is with a list comprehension:
indices = [ind for ind in dfmi.index if ind[:2] in (('A0','B0'),('A3','B1'))]
In [114]: dfmi.loc[indices]
Out[114]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 9 8 11 10
D1 13 12 15 14
C2 D0 17 16 19 18
D1 21 20 23 22
C3 D0 25 24 27 26
D1 29 28 31 30
A3 B1 C0 D0 225 224 227 226
D1 229 228 231 230
C1 D0 233 232 235 234
D1 237 236 239 238
C2 D0 241 240 243 242
D1 245 244 247 246
C3 D0 249 248 251 250
D1 253 252 255 254
Alternatively you can use Index.get_locs :
ind = dfmi.index
ind = [ind.get_locs(tup) for tup in (('A0','B0'),('A3','B1'))]
ind = np.concatenate(ind)
dfmi.iloc[ind]
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 9 8 11 10
D1 13 12 15 14
C2 D0 17 16 19 18
D1 21 20 23 22
C3 D0 25 24 27 26
D1 29 28 31 30
A3 B1 C0 D0 225 224 227 226
D1 229 228 231 230
C1 D0 233 232 235 234
D1 237 236 239 238
C2 D0 241 240 243 242
D1 245 244 247 246
C3 D0 249 248 251 250
D1 253 252 255 254
You can abstract the process with select_rows from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
dfmi.select_rows(('A0','B0'), ('A3','B1'))
Out[40]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 9 8 11 10
D1 13 12 15 14
C2 D0 17 16 19 18
D1 21 20 23 22
C3 D0 25 24 27 26
D1 29 28 31 30
A3 B1 C0 D0 225 224 227 226
D1 229 228 231 230
C1 D0 233 232 235 234
D1 237 236 239 238
C2 D0 241 240 243 242
D1 245 244 247 246
C3 D0 249 248 251 250
D1 253 252 255 254
Feature Type
[X] Adding new functionality to pandas
[ ] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas
Problem Description
I want to be able to select parts of a DataFrame by specifying a list of labels that do not cover all levels of a MultiIndex. This comes up when some subset of the levels of a MultiIndex comprise a composite key and need to be selected / slice together. As far as I can tell, the current slicing and selection tools either require you to specify all levels in the list of keys, or in slicing, treat the levels independently.
And to be clear, I don't mean to be proposing a new way of doing something that already exists. I offer this in case this problem does not already have a convenient solution, if such a solution exists, I'd love to know about it!
Taking the example from Advanced Indexing, the setup is the same giving
Let's say the first two levels are my composite key and I want to select
[("A0", "B0"), ("A3", "B1")]
, the result I want is this:Approaches I have tried, result in the selecting all the combinations of
["A0", "A3"]
and["B0", "B1"]
, such as below:Feature Description
Not being well versed in pandas internals, I won't try and propose how this could be implemented. For how it might be accessed, I could imagine two potential options.
df.loc
so the above example would look something like thisdf.xs()
so that thekey
argument could accept a list of keys resulting in something like thisAlternative Solutions
My current approach to this problem is to either concatenate the composite key together and select it conventionally, or to turn my list of keys to select into a DataFrame, reset the index, and select using merge like this:
Additional Context
No response