Closed mkeller-upb closed 2 years ago
you have None in your groups, which are dropped, see here
if you add df = df.fillna('foo')
at after you unpickle your script will work fine.
The way to 'solve' this problem is to fill the groups with a string, group, perform your operation, then if you really-really want a nan
in an index (which in general in allowed, but makes indexing almost impossible), then you can set those strings back to nan
.
That's good. So it's not a bug and everything is much easier. To remaining points:
a) I recommend to add a hint for this behavior in pandas.DataFrame.groupby.__doc__
.
b) And mention at all three documentation places (your tutorial link, DataFrame
, fillna
, that
Thanks a lot for your quick answer, jreback.
ok...will convert this issue to a doc updating one then...thanks for the comments
I'm adding something to this - just to bring this up the list. So what exactly has to be done - there needs to be a Doc change to the docs itself or the docstring as well?
As described in https://github.com/pandas-dev/pandas/pull/47337#pullrequestreview-1005333109=, there is dropna=False
which will keep the NA groups now so closing
Add more explicit docs / work-around for dealing with groupby and NA groups
(see comments)
Changelog: 07.Nov.2013: Add line to example below to preprocess table content.
I expect the following behavior: A
DataFrame.groupby
splits the dataframe/table into subtables according to the grouping-condition. A column name as a grouping-condition will give me subtables for each individual value in that column. Similarly, grouping with multiple columns (a list of column names) gives me a group for each occurring combination of these columns (or let me put it differently, the unique "values" of multiple columns to group for are tuples).So if I'm wrong with my expectations, I couldn't read a different meaning or to-expect-behavior from the documentation (e.g.
pandas.DataFrame.groupby.__doc__
), then there is a lake of clarification.Otherwise I found a bug and I am in the need for a fix: Some existing combinations are not provided with a group or splited subtable -- I checked it with
drop_duplicates
. And, finally,grouped.__iter__
ignores more/other combinations asgrouped.groups.keys()
-- Here, I also would expect, that both follows the same implementation...I tracked it to the depth of pandas to
pandas.core.Grouper._get_group_keys
or better_KeyMapper.get_key
,self.levels
looks good, but the list-comprehension-getmethod-zip-action goes wrong or eventuallypandas.core.Grouper.group_info
provides a too smallngroups
value oorr something else.pandas.__version__
: 0.12.0-1062-g3c57949 (from 6.11.2013)numpy.__version__
: 1.7.2 MacOSX 10.9Test Example: