machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.16k stars 49 forks source link

add a pipe function that doesn't pass data as first positional arg #246

Open machow opened 4 years ago

machow commented 4 years ago

An example came up on twitter where a user wants to pipe to seaborn. However, seaborn does not take data as its first argument.

Here is an example of seaborn from its docs:

import seaborn as sns
sns.set(style="darkgrid")

# Load an example dataset with long-form data
fmri = sns.load_dataset("fmri")

# Plot the responses for different events and regions
sns.lineplot(x="timepoint", y="signal",
             hue="region", style="event",
             data=fmri)

image

Proposal. A new pipe function (for now), called pipe_raw that does not pass data as first arg, and expands uses of _.

from siuba import _, filter

# proposed function
from siuba import pipe_raw

(fmri
  >> filter(_.region == "parietal")
  >> pipe_raw(sns.lineplot, 
            x="timepoint", y="signal",
            hue="region", style="event",
            data = _
        )
)

alternatively, the pipe function could choose not to pass data as the first argument, when _ is passed as an arg as in pipe_raw above. However, I might lean toward keeping them separate for now, until it's more clear about what people would want from a single, comprehesive pipe function.

grst commented 3 years ago

It seems this can already be achieved using a lambda function:

(
    fmri
    >> filter(_.region == "parietal")
    >> pipe(
        lambda _: sns.lineplot(
            x="timepoint", y="signal", hue="region", style="event", data=_
        )
    )
)

But it would be nice, of course, being able to use the magic _ directly.

machow commented 1 year ago

This should be resolved using the call() function. See https://siuba.org/guide/programming-pipes.html#call-external-functions . I think the last piece is to make this importable from siuba rather than siuba.siu