Closed ron2795 closed 2 years ago
Hi @ron2795 ,
group_by
function is already there in tidypandas.
For all the major verbs, there is a by
argument, where you can pass a subset of columns. And that would be equivalent to applying that verb on each group specified in by
and combining their results for the final output.
For e.g. If you want to filter penguins dataset, such that bill_length_mm
is greater than average bill_length_mm
, you can do it for the whole population as well as for each distinct sex group.
from palmerpenguins import load_penguins
penguins_tidy = tidyframe(load_penguins())
## filter such that bill_length is greater than average of bill_length_mm for whole population.
penguins_tidy.filter(lambda x: x['bill_length_mm'] > np.mean(x['bill_length_mm']))
## filter such that bill_length_mm is greater than average of bill_length_mm in each group(defined by 'sex')
penguins_tidy.filter(lambda x: x['bill_length_mm'] > np.mean(x['bill_length_mm']), by = 'sex')
Another e.g. for mutate
## mean shift `bill_length_mm` using mean of whole population
penguins_tidy.mutate({'bill_length_mm' : (lambda x: x - np.mean(x), 'bill_length_mm')})
## mean shift `bill_length_mm` per group
(penguins_tidy.mutate({'bill_length_mm' : (lambda x: x - np.mean(x), 'bill_length_mm')}
, by = 'sex'
)
)
I hope it will be added soon. also Thankyou for making this package it's quite simple and handy to use.