moodymudskipper / nakedpipe

Pipe Into a Sequence of Calls Without Repeating the Pipe Symbol.
69 stars 7 forks source link

shortcuts for data manipulation #18

Closed moodymudskipper closed 4 years ago

moodymudskipper commented 4 years ago

Maybe it's a bit crazy, but...

Why do :

cars %.% {
  transform(time = dist/speed)
}

when we could do :

cars %.% {
  time = dist/speed
}

Why do :

cars %.% {
  subset(speed > 7)
}

when we could do :

cars %.% {
  speed > 7
}

I think I could get used to that.

= is forbidden for now, and piping to a > expression doesn't make sense, so those would be unambiguous.

We could have more sophisticated conditions like speed > 7 & dist > 3, in this case the call will be piped to &, which is also not ambiguous.

daranzolin commented 4 years ago

Wow, independently of nakedpipe, I was just thinking that there could hypothetically be a catch-all function that detects whether an expression is boolean, creative, column names, etc. and evaluates it accordingly.

Here the reader can easily infer speed > 7 is a boolean filter, while time = dist/speed is clearly a transformation/mutation. To me, this is unambiguous and would be a joy to type:

cars %.% {
  speed > 7
  time = dist/speed
}

This syntax is probably scandalous to many, but I (for one) would embrace it.

moodymudskipper commented 4 years ago

Well since you're my only vocal client, it counts a lot! :)

I don't think it's hard to implement, I'll need to update the translate functions too, but I think it's ok too.

I was thinking of a summarizing functionality too, like :

mtcars %.% {
  avg_mpg = mean(mpg) ~ cyl + disp
}

Would group by cyl and disp, keep them, and add a mpg column with the mean (named mpg here because it's the only input, else would be named ), . It's unambiguous because a data.frame column cannot be a formula. But not sure if it's as useful.

We could also rename with :=

mtcars %.% {
  new_mpg := mpg
}

These would all fail if input is not a data.frame.

moodymudskipper commented 4 years ago

It works :), you can try it already.

library(nakedpipe)

cars %.% {
  speed > 22
  time = dist/speed
  TIME := time
}
#>    speed dist     TIME
#> 45    23   54 2.347826
#> 46    24   70 2.916667
#> 47    24   92 3.833333
#> 48    24   93 3.875000
#> 49    24  120 5.000000
#> 50    25   85 3.400000

mtcars %.% {
  head(10)
  avg_mpg = mean(mpg) ~ vs + am
}
#>   vs am avg_mpg
#> 1  0  0   16.50
#> 2  1  0   21.18
#> 3  0  1   21.00
#> 4  1  1   22.80

mtcars %.% {
  head(10)
  mean(mpg) ~ vs + am
}
#>   vs am mean(mpg)
#> 1  0  0     16.50
#> 2  1  0     21.18
#> 3  0  1     21.00
#> 4  1  1     22.80

I have not implemented the translations to magrittr yet

moodymudskipper commented 4 years ago

I'm not sure if supporting := for renaming is a good idea, in data.table and tidyverse it's used for transformations that = wouldn't allow, we might need it for the same reason, and renaming might not be so crucial to abbreviate. Let's keep it this way for now but not document it nor support the translation and debugging feature. Then decide after more use what to do.

It's not impossible that we might find another syntax for renaming, like :

new_name = ~old_name new_name = ({old_name}) ? new_name = old_name new_name = ?old_name

But rename() is not that bad so anything too sophisticated is overkill.

In tb I do {new_name} := old_name

moodymudskipper commented 4 years ago

support for aggregation as it is now is also temp and might be removed or change, but we can aggregate with data.table or tb syntax now :

library(nakedpipe)
mtcars %.% {
  head(10)
  .dt[,.(avg_mpg = mean(mpg)), .(vs, am)]
}
#> Loading required namespace: data.table
#>   vs am avg_mpg
#> 1  0  1   21.00
#> 2  1  1   22.80
#> 3  1  0   21.18
#> 4  0  0   16.50

library(tb) # for s()
mtcars %.% {
  head(10)
  .tb[avg_mpg = mean(mpg), .by = s(vs, am)]
}
#>   vs am avg_mpg
#> 1  0  1   21.00
#> 2  1  1   22.80
#> 3  1  0   16.50
#> 4  0  0   21.18
moodymudskipper commented 4 years ago

basic shorthands are implemented, more sophisticated possibilities will be explored here : https://github.com/moodymudskipper/nakedpipe/issues/21

github-actions[bot] commented 2 years ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.