r-lib / vctrs

Generic programming with typed R vectors
https://vctrs.r-lib.org
Other
289 stars 66 forks source link

C implementations of `vec_case_when()` and `vec_case_match()` #1622

Open DavisVaughan opened 2 years ago

DavisVaughan commented 2 years ago

And possibly vec_if_else() because it would be nice for, say, ggplot2 to be able to use this.

Consider if we can figure out some kind of 1:1 interface that doesn't always require a list for values and haystacks to nicely replace plyr::mapvalues() https://github.com/tidyverse/dplyr/issues/7027 (the list approach is very powerful and general because it allows for 1:m and m:1 replacements, but is not always needed)

olivroy commented 6 months ago

Is it still planned? I saw this was the proposed solution to replacing splicing for dplyr::recode(). Using recode() is slowing down code because of lifecycle, so I wondered if I could rely on a faster vctrs implementation sometimes in the future..

Cf. https://github.com/tidyverse/dplyr/issues/6623#issuecomment-1362887413

The bench marks for recreating the formula can be a bit expansive

# manually created
a_formula <- c("xx" ~ "x", "y" ~ "yy")

dplyr::case_match(
  c("x", "y", "z"),
  "zz" ~ "a",
  !!!a_formula,
  .default = NA_character_
)
#> [1] NA   "yy" NA

# What I have 
a_list <- c("xx" = "x", "y" = "yy")
dplyr::recode(
  c("x", "y", "z"),
  !!!a_list,
  .default = NA_character_
)
#> [1] NA   "yy" NA

# programatically recreated
a_formula_from_list <- purrr::map2(
  names(a_list),
  unname(a_list),
  rlang::new_formula
)
dplyr::case_match(
  c("x", "y", "z"),
  "zz" ~ "a",
  !!!a_formula_from_list,
  .default = NA_character_
)
#> [1] NA   "yy" NA

bench::mark(
  recode = dplyr::recode(
    c("x", "y", "z"),
    !!!a_list,
    .default = NA_character_
  ),
  casematch_program = {
    a_formula_from_list <- purrr::map2(
      names(a_list),
      unname(a_list),
      rlang::new_formula
    )
    dplyr::case_match(
      c("x", "y", "z"),
      "zz" ~ "a",
      !!!a_formula_from_list,
      .default = NA_character_
    )
  },
  casematch_regular = dplyr::case_match(
    c("x", "y", "z"),
    "zz" ~ "a",
    !!!a_formula,
    .default = NA_character_
  )
)
#> # A tibble: 3 × 6
#>   expression            min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 recode             802µs    888µs      988.        0B     8.43
#> 2 casematch_program  366µs    385µs     2355.     1.3KB    10.4 
#> 3 casematch_regular  295µs    311µs     2860.    1.05KB    10.4
# programatically recreating the values can become expansive

Created on 2024-05-07 with reprex v2.1.0

DavisVaughan commented 5 months ago

Unfortunately it is taking us longer than expected to get some time for a vctrs release, but this is definitely still something I want to add, as I think a lot of people would like a low level type stable vec_if_else() that doesn't need dplyr (particularly ggplot2)