Open rben01 opened 5 years ago
Hmm OK makes sense. Investigation into the issue and PRs are always welcome!
Depending on how you read the API docs, passing an index is either not explicitly supported or this is the "expected" behavior (the closest acceptable input to by
for an Index is "list", but maybe this should say "list-like").
This occurs because
print(list(df.index))
[(0, 0), (1, 1), (0, 2), (1, 0), (0, 1), (1, 2), (0, 0), (1, 1), (0, 2), (1, 0)]
So I think this is either a docs issue or an enhancement request. I'm open to carving out special handling of passing a MultiIndex into groupby.
cc @jbrockmendel for any thoughts.
Code Sample, a copy-pastable example if possible
Problem description
When you group a DataFrame, whose index is a MultiIndex, on its index, resulting aggregations will be a DataFrame with a single-level index containing the tuples from the original MultiIndex. This is inferior to the behavior you obtain when passing the level names to
df.groupby
, which returns a DataFrame with the same MultiIndex levels and names.Expected Output
When
df
has a MultiIndex,df.groupby(df.index)
should be be identical todf.groupby(level=list(range(df.index.nlevels)))
(ordf.groupby(df.index.names)
in the event that all ofdf
's index levels are named).Output of
pd.show_versions()