pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.25k stars 17.79k forks source link

ENH: Please provide `register_groupBy_accessor` #35403

Open talegari opened 4 years ago

talegari commented 4 years ago

Thanks for providing the option to create custom accessors via pandas.api.extensions. This is prefereable method to extend pandas dataframes over inheritance/sub-classing. Currently, this supports these: dataframe, series, index.

Please extend this support for groupBy objects.

TomAugspurger commented 4 years ago

Can you provide some motivation for this? Why doesn't passing callables to agg / transform / apply suffice?

talegari commented 4 years ago

Tom,

Thanks for picking this up.

I started with writing dplyr like operations on pandas dataframes with the primary intent of letting the user not worry about the indexes (see is_tidy_frame). For example, mutate will work on both grouped and ungrouped dataframes via the right accessor.

Having the groupby accessor will avoid two things:

  1. Checking whether the input is a tidyframe
  2. Avoid the pointless if else in the code. Here is the current code:
def mutate(df, func_list, check = True):

    if check:
        assert is_tidy_frame(df)

    if is_grouped_frame(df):
        groupvars = group_vars(df)
        for func in func_list:
            df = df.apply(func)
        df = df.reset_index(drop = True).pipe(group_by, groupvars)
    else:
        for func in func_list:
            df = df.pipe(func)

    return(df)
pwwang commented 2 years ago

This is definitely helpful for developers who want to wrap pandas APIs.

I had some workarounds to implement datar, which reimages pandas APIs and align them to tidyverse's.