Open jreback opened 7 years ago
Can I take this up?
that would be great @mayukh18 !
@jreback I was thinking about putting a passthru check for specifically the how == 'add'
case a level above and after the actual cython implementation. Does this seem like a good idea?
no, you simply need to add this on to _groupby_function
near the top. Needs testing for all ops which support skipna
(pretty much all numeric ops)
Adding this in _groupby_function
will not effect any change for ops like mean
, median
etc as they are separately implemented. So, do those need the skipna
too? Whereas, first
and last
ops don't seem like requiring skipna
but they will be effected by modifying _groupby_function
. I am naive so pardon me if I am asking silly stuff.
@mayukh18 yes the implementation is a bit muddled.
methods which are defined with an actual function (e.g. median/mean
), this is very easy, just add to the validation function where numeric_only
is now; that will pass it thru.
we can still use _groupby_function
, what I would do is this.
add the default for skipna
(so do this on all but first/last)
sum = _groupby_function('sum', 'add', np.sum, skipna=True)
then add skipna=None
in the signature for _groupby_function
, which will add the arg if its not None to kwargs
(so it gets passed).
This is a surprisingly old issue, but this functionality would be really nice and consistent with the non-groupby methods.
Indeed, I believe this is not just a "nice-to-have" any more, but a necessary step before this FutureWarning
can be resolved by users:
FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().
As long as DataFrame.sum()
and DataFrame.groupby().sum()
(and other agg functions) have inconsistent APIs, dropping the level
kwarg from non-grouped classes isn't really a good step, IMO.
take
take
Edit[rhshadach]: The following methods do not have a skipna argument in groupby, but do have such an argument on the Series/DataFrame variant.
https://github.com/pandas-dev/pandas/issues/15674
ideally write [21] as
df.groupby('l').v.sum(skipna=False)