tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

Complete matrix and parallel functions #5

Open hadley opened 8 years ago

hadley commented 8 years ago
Vector Summary Cumulative Parallel Matrix
+ sum cumsum rowSums
* prod cumprod
& all cumall
\| any cumany
smallest() min cummin pmin
greatest() max cummax pmax
smallest <- function(x, y) if (x =< y) x else y
greatest <- function(x, y) if (x >= y) x else y

cf http://adv-r.had.co.nz/Functionals.html#function-family

It may be possible to avoid the matrix/row family by automatically vectorising over data frames and rows of matrices. OTOH that may be unappealing since it would mean the function sometimes summarised and sometimes transformed.

hadley commented 8 years ago

Particularly important for standardised NA handling

jonocarroll commented 8 years ago

By "complete" do you mean this might have scope for introducing psum, pprod, pmean? The critical inclusion being the option of na.rm.

I know it can be handled somewhat by an ifelse or just comparison operators, but I'd also like to see a which.pmin/which.pmax for extracting out a binary selector (also handling NA properly).

hadley commented 8 years ago

Yes, that's the whole point. I doubt I will implement any variation of which

jonocarroll commented 8 years ago

Ah, and only now do I see https://github.com/hadley/dplyr/issues/968 -- yep, that would be very handy. Just got stung by it again.

hadley commented 7 years ago

Also needs to include ==, > etc

DavisVaughan commented 5 years ago

Also think about rolling variants of these, and how the cumulative + rolling variants could be unified under the idea of "window functions", where cumulative is just sliding with an unbounded start location. https://dbplyr.tidyverse.org/articles/translation-function.html#window-functions

(slider does all of this now with specialized variants for each summary function)

romainfrancois commented 3 years ago

oh interesting, so:

library(slider)

incremental_lgl <- function(x, .f, ...) {
  slide_lgl(x, .f, ..., .before = Inf)
}
incremental_lgl(c(TRUE, FALSE, TRUE), all)
#> [1]  TRUE FALSE FALSE
incremental_lgl(c(TRUE, FALSE, TRUE), any)
#> [1] TRUE TRUE TRUE

incremental_any <- function(x, ...) {
  slide_any(x, ..., before = Inf)
}
incremental_all <- function(x, ...) {
  slide_all(x, ..., before = Inf)
}
incremental_any(c(TRUE, FALSE, TRUE))
#> [1] TRUE TRUE TRUE
incremental_all(c(TRUE, FALSE, TRUE))
#> [1]  TRUE FALSE FALSE

Created on 2021-05-05 by the reprex package (v2.0.0)

romainfrancois commented 3 years ago
library(rlang)

p <- function(.f) {
  .f <- as_function(.f)
  function(...) {
    vctrs::vec_c(!!!purrr::pmap(list2(...), .f))
  }
}
p(min)(c(1, 2), c(2, 1))
#> [1] 1 1
p(max)(c(1, 2), c(2, 1))
#> [1] 2 2
p(sum)(c(1, 2), c(2, 1))
#> [1] 3 3
p(prod)(c(1, 2), c(2, 1))
#> [1] 2 2

p(any)(c(TRUE, FALSE), c(TRUE, FALSE))
#> [1]  TRUE FALSE
p(all)(c(TRUE, FALSE), c(TRUE, FALSE))
#> [1]  TRUE FALSE

Created on 2021-05-05 by the reprex package (v2.0.0)

romainfrancois commented 3 years ago

The "matrix" is just map() where iteration is governed by vec_slice() :

library(rlang)
library(vctrs)

dice <- function(x, .f) {
  .f <- as_function(.f)
  vec_c(!!!lapply(vec_chop(x), .f))
}

(m <- matrix(1:10, nrow = 2, byrow = TRUE))
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    3    4    5
#> [2,]    6    7    8    9   10
dice(m, min)
#> [1] 1 6
dice(m, max)
#> [1]  5 10
dice(m, prod)
#> [1]   120 30240
dice(m, sum)
#> [1] 15 40

(m <- cbind(c(TRUE, FALSE, TRUE), c(FALSE, TRUE, TRUE)))
#>       [,1]  [,2]
#> [1,]  TRUE FALSE
#> [2,] FALSE  TRUE
#> [3,]  TRUE  TRUE
dice(m, any)
#> [1] TRUE TRUE TRUE
dice(m, all)
#> [1] FALSE FALSE  TRUE

Created on 2021-05-05 by the reprex package (v2.0.0)

DavisVaughan commented 3 years ago

A specialized incremental_any() is probably still a decent bit faster than slide_any(x, ..., before = Inf), since the latter does much more than cumulative windows