machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.15k stars 49 forks source link

allow referring to previously created columns in summarize #20

Open machow opened 5 years ago

machow commented 5 years ago

e.g.

from siuba.data import mtcars
from siuba import *

mtcars >> summarize(avg_mpg = _.mpg.mean(), avg_kpg = _.avg_mpg * 1.6)
machow commented 4 years ago

Just a thought, this is very doable in the experimental fast_summarize. This is because it can compose operations mixing aggregated results (e.g. 1 value per group) and the original data (see ADR-003.

e.g.

mtcars >> group_by(_.cyl) >> summarize(wat = _.mpg.mean() + _.mpg)

Would just be...