machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.14k stars 48 forks source link

Add metavariable(s) for purrr-like formulas #438

Closed machow closed 1 year ago

machow commented 2 years ago

dplyr functions like across allow you to apply an operation to each of a set of selected columns, using syntax of this form:

Importantly, the column application step allows you to refer to both...

This means that siuba needs to differentiate...

Looking broadly at siuba's pipe and call behavior, there are many cases where this could surface. E.g. basically anywhere a lambda might surface in a call():

cf: https://github.com/machow/siuba/pull/413

machow commented 1 year ago

This could use a section of its own in the guide, but is now handled using the Fx object. See the across docs for details

https://siuba.org/guide/programming-across.html#basic-use

from siuba import _, across, Fx, group_by, mutate, summarize, filter, arrange
from siuba.data import mtcars

mtcars >> summarize(across(_[_.startswith("m"), _.endswith("p")], Fx.mean()))