Open DavisVaughan opened 2 years ago
Is it still planned? I saw this was the proposed solution to replacing splicing for dplyr::recode()
. Using recode()
is slowing down code because of lifecycle, so I wondered if I could rely on a faster vctrs implementation sometimes in the future..
Cf. https://github.com/tidyverse/dplyr/issues/6623#issuecomment-1362887413
The bench marks for recreating the formula can be a bit expansive
# manually created
a_formula <- c("xx" ~ "x", "y" ~ "yy")
dplyr::case_match(
c("x", "y", "z"),
"zz" ~ "a",
!!!a_formula,
.default = NA_character_
)
#> [1] NA "yy" NA
# What I have
a_list <- c("xx" = "x", "y" = "yy")
dplyr::recode(
c("x", "y", "z"),
!!!a_list,
.default = NA_character_
)
#> [1] NA "yy" NA
# programatically recreated
a_formula_from_list <- purrr::map2(
names(a_list),
unname(a_list),
rlang::new_formula
)
dplyr::case_match(
c("x", "y", "z"),
"zz" ~ "a",
!!!a_formula_from_list,
.default = NA_character_
)
#> [1] NA "yy" NA
bench::mark(
recode = dplyr::recode(
c("x", "y", "z"),
!!!a_list,
.default = NA_character_
),
casematch_program = {
a_formula_from_list <- purrr::map2(
names(a_list),
unname(a_list),
rlang::new_formula
)
dplyr::case_match(
c("x", "y", "z"),
"zz" ~ "a",
!!!a_formula_from_list,
.default = NA_character_
)
},
casematch_regular = dplyr::case_match(
c("x", "y", "z"),
"zz" ~ "a",
!!!a_formula,
.default = NA_character_
)
)
#> # A tibble: 3 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 recode 802µs 888µs 988. 0B 8.43
#> 2 casematch_program 366µs 385µs 2355. 1.3KB 10.4
#> 3 casematch_regular 295µs 311µs 2860. 1.05KB 10.4
# programatically recreating the values can become expansive
Created on 2024-05-07 with reprex v2.1.0
Unfortunately it is taking us longer than expected to get some time for a vctrs release, but this is definitely still something I want to add, as I think a lot of people would like a low level type stable vec_if_else()
that doesn't need dplyr (particularly ggplot2)
And possibly
vec_if_else()
because it would be nice for, say, ggplot2 to be able to use this.Consider if we can figure out some kind of 1:1 interface that doesn't always require a list for
values
andhaystacks
to nicely replaceplyr::mapvalues()
https://github.com/tidyverse/dplyr/issues/7027 (the list approach is very powerful and general because it allows for 1:m and m:1 replacements, but is not always needed)