I do not find 'group_by' functionality in tidypandas for data analysis.

Hi @ron2795 ,

group_by function is already there in tidypandas.

For all the major verbs, there is a by argument, where you can pass a subset of columns. And that would be equivalent to applying that verb on each group specified in by and combining their results for the final output.

For e.g. If you want to filter penguins dataset, such that bill_length_mm is greater than average bill_length_mm, you can do it for the whole population as well as for each distinct sex group.

from palmerpenguins import load_penguins
penguins_tidy = tidyframe(load_penguins())

## filter such that bill_length is greater than average of bill_length_mm for whole population.
penguins_tidy.filter(lambda x: x['bill_length_mm'] > np.mean(x['bill_length_mm']))

## filter such that bill_length_mm is greater than average of bill_length_mm in each group(defined by 'sex')
penguins_tidy.filter(lambda x: x['bill_length_mm'] > np.mean(x['bill_length_mm']), by = 'sex')

Another e.g. for mutate

## mean shift `bill_length_mm` using mean of whole population
penguins_tidy.mutate({'bill_length_mm' : (lambda x: x - np.mean(x), 'bill_length_mm')})

## mean shift `bill_length_mm` per group
(penguins_tidy.mutate({'bill_length_mm' : (lambda x: x - np.mean(x), 'bill_length_mm')}
                     , by = 'sex'
                     )
)

tidypyverse / tidypandas

I do not find 'group_by' functionality in tidypandas for data analysis. #8