Open shoyer opened 9 years ago
I typically do the same as you, but most often it's just a single group. In that case it's just group = next(iter(gr))
which isn't bad. We could overload __getitem__
so that gr[:5]
is pretty much this, but I don't know if the use-case warrents that extra complexity.
This can be shortened by using the toolz
library:
from toolz import take
pd.concat(list(take(3, group)))
Example of getting the last group here:
In [12]: df = pd.DataFrame({'a':['1','2','2','4','5','2'], 'b':np.random.randn(6)})
In [13]: g = df.groupby('a')
In [14]: g.groups
Out[14]: {'1': [0], '2': [1, 2, 5], '4': [3], '5': [4]}
In [15]: import itertools
In [16]: list(itertools.islice(g,len(g)-1,len(g)))
Out[16]:
[('5', a b
4 5 -0.644857)]
Do we want a convenience function here?
Just putting some random ideas:
g.names
attribute returning the group names in the correct order? (g.groups.keys()
is not ordered), so you can do g.get_group(g.names[-1])
or g.get_group(g.names[0])
get_igroup
method that lets you retrieve the group by order instead of name (or as an argument to get_group
): g.get_igroup(-1)
split
-like functionality on the groupby object, returning a list of all groups: g.split()[-1]
(but, this has to create all the groups to just get the last one)__getitem__
(like for a DataFrame, a slice also slices the rows and not the columns) to slice the groups instead of the columns (currently this gives a TypeError: unhashable type)g.slice_groups(0, 1)
or g.slice_groups(-1, None)
Getting groups by index would be useful for my application.
I have implemented a simple function with the name get_igroup()
as suggested by @jorisvandenbossche. The implementation is based on how get_group()
does it via _get_indices()
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(8), index=index)
print s
def get_igroup(g, i):
""" Get grouby group by index
g : pandas.grouby object
i : int
"""
keys = g.indices.keys()
keys.sort()
indices = g.indices.get(keys[i])
return g._selected_obj.take(indices)
print '\n=================================='
print "Testing with `.groupby('first')` "
g = s.groupby('first')
for i in [0, 1, -1]:
print '\n------------------------------- \nget_group for index=%d \n' % i
print get_igroup(g, i).head()
print '\n=================================='
print "Testing with `.groupby(['first', 'second'])` "
g = s.groupby(['first', 'second'])
for i in [0, 1, -1]:
print '\n------------------------------- \nget_group for index=%d \n' % i
print get_igroup(g, i).head()
Output of the script above:
@jreback If desired I could implement this or something similar via a PR. I am not sure about how much effort testing this would be, though.
I'm -1 on this as I don't think we make any guarantees about the ordering of groups within a groupby
@WillAyd: Since I sort the group keys, the resulting order of the index retrieval should be deterministic even though the keys are unordered at first. The sorting also works correctly for tuple keys, as you can see in my example. But I understand that this might have some edge cases which lead to inconsistencies.
For visualization/testing purposes, I'm often interested in looking at the first example group(s) from a groupby operation.
Is there a convenient shortcut for this? The best I could come up with is
pd.concat([x for _, x in itertools.islice(group, 3)])
which seemed awkward to me.Note that this is a different use-case from
.first()
/.head()
, which returns the first example from each group -- here I want full examples of the first few groups.