Closed bls-lehoai closed 1 year ago
Hi,
That is already possible within the aggregation, for example:
import vaex
df = vaex.example()
# Option 1:
df_grouped = df.groupby('id').agg({'count_percentage': vaex.agg.count() / df.shape[0] * 100 })
print(df_grouped)
# Option 2 (probably what you are doing?)
df_grouped = df.groupby('id').agg({'count': vaex.agg.count()})
df_grouped['count_percentage'] = df_grouped['count'] / df.shape[0] * 100
print(df_grouped)
I think this is enough.. adding a specific aggregator to do the above would be possible.. but i feel it would bloat the API since it does not really add new functionality (it is a linear combination of existing stuff).
If you just feel like you need a shortcut in case you use this a lot in your project, you can probably make an extension yourself, following this part of the tutorial
Also, arithmetic combinations of existing aggregators is allowed, for example:
# following the example above
df.groupby('id').agg({'mean_over_std': vaex.agg.mean('x') / vaex.agg.std('y') })
Is this what you mean? it is possible I've misunderstood you completely..
@JovanVeljanoski Thank you so much!
Hi there, I need to calculate the percentage of each group after "group by" data ( group count / total count * 100% ). It's very convenient if there is a "vaex.agg.cpercent". Now I have to count by each group and take percentages by myself.