pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.35k stars 169 forks source link

[ENH] Method for adding functionality to GroupBy #587

Open zbarry opened 5 years ago

zbarry commented 5 years ago

It would be nice to be able to add functionality to the Pandas GroupBy objects: GroupBy, DataFrameGroupBy, SeriesGroupBy. There's no convenient accessor interface to do this, but maybe there's a way to reliably monkeypatch them. This would allow us to create nifty aggregation / apply functions and avoid the .groupby(...).apply() route for tasks we may encounter routinely. It could also potentially open up opportunities to speed up such operations... .groupby().apply() can often be slow for large numbers of groups.

zbarry commented 5 years ago

@Zsailer - what do you think about such a capability in PF?

samukweku commented 2 years ago

@zbarry @ericmjl @pyjanitor-devs/core-devs how can we make this possible? is this even possible?

samukweku commented 1 year ago

one way about this is with a summarise function, that has a by parameter, and within that function we can do all the magic within it. inspired by the update to the summarise feature coming in dplyr 1.1, and rdatatable and pydatatable use of by.

crude API example

df.summarise(col_name = func or arg name, by = func or kwargs)

We can even make it such that you can filter within a groupby effectively (maybe?)