Open romainfrancois opened 3 years ago
I can simplify to vec_slice(x, vec_in(x, y)) <- NA
but I believe it's interesting to allow y
to be a predicate. Or maybe it should be another argument whose default is derived from y
?
na_if <- function(x, y, .fn = function(x) vec_in(x, y)){ ... }
Probably needs some type checking, i.e.
funs::na_if(1:10, TRUE)
#> [1] NA 2 3 4 5 6 7 8 9 10
Created on 2021-05-06 by the reprex package (v2.0.0)
Is this expected ? cc @lionel- @DavisVaughan ?
vctrs::vec_in(1:4, TRUE)
#> [1] TRUE FALSE FALSE FALSE
Created on 2021-05-06 by the reprex package (v2.0.0)
Is this expected ?
I think so, because we use the common type which is integer in that case. Maybe we should directionally coerce instead though? This would make this case an error.
Is it not a missed opportunity that this is only for setting NA
. Perhaps we. Perhaps we can have :
#' @export
patch_if <- function(x, y, replacement) {
if (is_formula(y)) {
y <- as_function(y)
}
if (is_function(y)) {
selected <- vec_assert(y(x), ptype = logical(), size = vec_size(x))
} else {
selected <- vec_in(x, y, needles_arg = "y", haystack_arg = "x")
}
vec_slice(x, selected) <- replacement
x
}
Hmmm, interesting idea.
I could see this being two functions:
replace_values(x, what, with = NA)
replace_at(x, i, with = NA)
Where:
what
is a vector of values with the same type as x
i
is a valid subscript into x
, or a predicate function generating a subscript into x
. So it could be:
x
and generating one of the aboveI have always found the y
argument of na_if()
a bit confusing. It is hard to explain why, but has something to do with the pairing of "if" in the function name with the fact that you supply values to replace with NA. To me, "if" implied that there needed to be some kind of logical predicate involved
Yeah, agreed that na_if
is confusing. It was meant to be a direct translation of nullif()
from SQL, but hardly anyone knows what that is so it doesn't help.
replace_at()
defined in this way would behave differently than the dplyr and purrr functions with the same suffix. If we use the existing naming scheme, replace_at()
would take names or locations, and replace_if()
would take a vectorised predicate or a logical vector.
These functions could also be named replace()
, set_at()
, set_if()
.
Another variant to consider:
x |> set_across(starts_with("foo"), NA)
In case it is useful, this is how I've named the functions in naniar
for replacing values with NA
replace_at() defined in this way would behave differently than the dplyr and purrr functions with the same suffix.
To be clear, I think it might be a good idea to gather all these index semantics in a single function. This would be consistent with the move to across()
in dplyr. In general the overloading of [
is an important part of the vector interface in R and I no longer think it's important to make explicit the kind of selection used at the call site (which is often clear from the code anyway).
I'm just worried about reusing _at
in a different way than in purrr and the superseded dplyr functions. Maybe we don't need a suffix, e.g.
replace <- function(x, set, value) { ... } # set: Set of values
set <- function(x, where, value) { ... } # where: Locations, names, logicals, predicate
set(x, is.na, "foo")
set(x, x == "foo", NA)
Further playing with the idea of "replacing many things" here https://github.com/tidyverse/funs/pull/66
library(magrittr)
library(funs, warn.conflicts = FALSE)
alphabet <- c(letters[1:10], NA)
alphabet %>%
patch(
when(c("a", "e", "i", "o", "u"), "vowel"),
when(NA , "missing"),
when(default , "consonent")
)
#> [1] "vowel" "b" "c" "d" "vowel" "f" "g" "h" "vowel"
#> [10] "j" NA
x <- 1:10
x %>%
patch(
when(~.x < 3 , 3),
when(~. > 7 , 7),
when(c(4, 5, 6), NA)
)
#> [1] 3 3 3 NA NA NA 7 7 7 7
x %>%
patch(
when(x < 3 , 3),
when(x > 7 , 7),
when(c(4, 5, 6), NA)
)
#> [1] 3 3 3 NA NA NA 7 7 7 7
Created on 2021-05-07 by the reprex package (v2.0.0)
na_if()
then is:
library(funs, warn.conflicts = FALSE)
x <- 1:10
na_if <- function(x, what) {
patch(x, when(what, NA))
}
na_if(x, x == 2)
#> [1] 1 NA 3 4 5 6 7 8 9 10
Created on 2021-05-07 by the reprex package (v2.0.0)
This feels highly related to https://twitter.com/antoine_fabri/status/1392127389195452416, which I have wanted a better solution to for a while now. The key here is that with
is allowed to be vectorized with the same length as x
, not the same length as which(where)
, which is why base::replace()
wouldn't work.
library(dplyr)
library(vctrs)
replace_if <- function(x, where, with) {
x_size <- vec_size(x)
vec_assert(where, ptype = logical(), size = x_size, arg = "where")
with <- vec_recycle(with, x_size, x_arg = "with")
with <- vec_cast(with, x, x_arg = "with", to_arg = "x")
with <- vec_slice(with, where)
vec_assign(x, where, with, x_arg = "x", value_arg = "with")
}
band_instruments %>%
mutate(
name = replace_if(name, plays == "guitar", paste0(name, "!")),
plays2 = replace_if(plays, plays == "bass", NA)
)
#> # A tibble: 3 x 3
#> name plays plays2
#> <chr> <chr> <chr>
#> 1 John! guitar guitar
#> 2 Paul bass <NA>
#> 3 Keith! guitar guitar
Still from #66 and its proposed patch(when())
syntax, allowing to replace multiple things:
library(dplyr, warn.conflicts = FALSE)
library(funs, warn.conflicts = FALSE)
band_instruments %>%
mutate(
name = patch(name,
when(plays == "guitar", paste0(name, "!")),
when(plays == "bass", paste0(name, "@"))
)
)
#> # A tibble: 3 x 2
#> name plays
#> <chr> <chr>
#> 1 John! guitar
#> 2 Paul@ bass
#> 3 Keith! guitar
Created on 2021-05-17 by the reprex package (v2.0.0)
Using when()
here, or something else gives us the patch(...)
so that we can replace multiple things, and when(what=, with=)
instead of a formula as in case_when()
allows the use of formula for what=
and (maybe but not yet) with=
closes #43
The example from https://github.com/tidyverse/dplyr/issues/5711 then can be:
Created on 2021-05-06 by the reprex package (v2.0.0)