Closed garrettgman closed 7 years ago
Adding a pick
(or perhaps more intelligibly pswitch
) window function would also be helpful. Another feature for making these window functions more useful would be to be able specify a range of columns in mutate, just as with select. In my chemistry related data, people love to store compositional information by physical element, meaning that to get total weight I need to sum 70-100 columns.
Vector | Summary | Cumulative | Parallel | Matrix |
---|---|---|---|---|
+ |
sum |
cumsum |
rowSums |
|
* |
prod |
cumprod |
||
min |
cummin |
pmin |
||
max |
cummax |
pmax |
||
mean |
cummean |
rowMeans |
||
& |
all |
cumall |
||
| |
any |
cumany |
Could be a job for purrr:
# A function operator to parallelise summary functions:
parallelise <- function(.f, ...) {
f <- partial(.f, ...)
function(.x, .type = NULL) {
res <- pmap(.x, f)
as_vector(res, .type)
}
}
Example with NA
detection:
# Creating summary function for is.na()
any_na <- partial(some, .p = is.na) %>% lift_ld()
# Parallelising any_na()
p_any_na <- parallelise(any_na)
Which gives:
any_na(1:3, 3, 5, NA)
#> TRUE
df_na <- tibble(
~x, ~y,
NA, 1,
2, NA,
3, 3
)
p_any_na(df)
#> [1] TRUE TRUE FALSE
Or should the parallelised functions take dots instead of lists?
parallelise_d <- function(.f, ...) {
f <- partial(.f, ...)
function(..., .type = NULL) {
res <- pmap(list(...), f)
as_vector(res, .type)
}
}
psum <- parallelise_d(sum, na.rm = TRUE)
nas %>% mutate(d = psum(a, b, c))
#> Source: local data frame [4 x 4]
#>
#> a b c d
#> (dbl) (dbl) (dbl) (dbl)
#> 1 NA 1 1 2
#> 2 2 NA 2 4
#> 3 3 3 NA 6
#> 4 NA NA NA 0
The list versions would work with the cols()
helper mentioned in #1367
It may need empty elements handling:
parallelise <- function(.f, ..., .empty = NULL) {
f <- partial(.f, ...)
force(.empty)
function(.x, .type = NULL) {
if (length(.x) == 0) {
.empty
} else {
res <- pmap(.x, f)
as_vector(res, .type)
}
}
}
p_any_na <- parallelise(any_na, .empty = FALSE)
psum <- parallelise(sum, .empty = 0)
And then scalar recycling etc.
@lionel- I was thinking these might need to be individually written in C++ for performance (but a standard matrix vectoriser would still be nice)
Moved to vctrs
sum
and+
leave a gap when working withNA
s:Something that works like this would be useful. I'm sure there's a better way to implement it
Other useful functions would be
psum
,pprod
,pmean
,psd
,pall
,pany
.