tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

+ na_if() #65

Open romainfrancois opened 3 years ago

romainfrancois commented 3 years ago

closes #43

The example from https://github.com/tidyverse/dplyr/issues/5711 then can be:

library(dplyr, warn.conflicts = FALSE)
library(funs)
na_if <- funs::na_if

test <- tibble(Staff.Confirmed = c(0, 1, -999), Residents.Confirmed = c(12, -192, 0))
print(test)
#> # A tibble: 3 x 2
#>   Staff.Confirmed Residents.Confirmed
#>             <dbl>               <dbl>
#> 1               0                  12
#> 2               1                -192
#> 3            -999                   0

out <-  test %>% 
  mutate(staff_conf_naif = na_if(Staff.Confirmed, ~Staff.Confirmed < 0),
         staff_conf_ifelse = ifelse(Staff.Confirmed < 0, NA, Staff.Confirmed),

         res_conf_naif = na_if(Residents.Confirmed, ~ Residents.Confirmed < 0),
         res_conf_ifelse = ifelse(Residents.Confirmed < 0, NA, Residents.Confirmed)) %>% 
  select(Staff.Confirmed, staff_conf_naif, staff_conf_ifelse,
         Residents.Confirmed, res_conf_naif, res_conf_ifelse)
print(out)
#> # A tibble: 3 x 6
#>   Staff.Confirmed staff_conf_naif staff_conf_ifelse Residents.Confirmed
#>             <dbl>           <dbl>             <dbl>               <dbl>
#> 1               0               0                 0                  12
#> 2               1               1                 1                -192
#> 3            -999              NA                NA                   0
#> # … with 2 more variables: res_conf_naif <dbl>, res_conf_ifelse <dbl>

Created on 2021-05-06 by the reprex package (v2.0.0)

romainfrancois commented 3 years ago

I can simplify to vec_slice(x, vec_in(x, y)) <- NA but I believe it's interesting to allow y to be a predicate. Or maybe it should be another argument whose default is derived from y ?

na_if <- function(x, y, .fn = function(x) vec_in(x, y)){ ... }
romainfrancois commented 3 years ago

Probably needs some type checking, i.e.

funs::na_if(1:10, TRUE)
#>  [1] NA  2  3  4  5  6  7  8  9 10

Created on 2021-05-06 by the reprex package (v2.0.0)

romainfrancois commented 3 years ago

Is this expected ? cc @lionel- @DavisVaughan ?

vctrs::vec_in(1:4, TRUE)
#> [1]  TRUE FALSE FALSE FALSE

Created on 2021-05-06 by the reprex package (v2.0.0)

lionel- commented 3 years ago

Is this expected ?

I think so, because we use the common type which is integer in that case. Maybe we should directionally coerce instead though? This would make this case an error.

romainfrancois commented 3 years ago

Is it not a missed opportunity that this is only for setting NA. Perhaps we. Perhaps we can have :

#' @export
patch_if <- function(x, y, replacement) {
  if (is_formula(y)) {
    y <- as_function(y)
  }

  if (is_function(y)) {
    selected <- vec_assert(y(x), ptype = logical(), size = vec_size(x))
  } else {
    selected <- vec_in(x, y, needles_arg = "y", haystack_arg = "x")
  }
  vec_slice(x, selected) <- replacement

  x
}
hadley commented 3 years ago

Hmmm, interesting idea.

DavisVaughan commented 3 years ago

I could see this being two functions:

Where:

I have always found the y argument of na_if() a bit confusing. It is hard to explain why, but has something to do with the pairing of "if" in the function name with the fact that you supply values to replace with NA. To me, "if" implied that there needed to be some kind of logical predicate involved

hadley commented 3 years ago

Yeah, agreed that na_if is confusing. It was meant to be a direct translation of nullif() from SQL, but hardly anyone knows what that is so it doesn't help.

lionel- commented 3 years ago

replace_at() defined in this way would behave differently than the dplyr and purrr functions with the same suffix. If we use the existing naming scheme, replace_at() would take names or locations, and replace_if() would take a vectorised predicate or a logical vector.

These functions could also be named replace(), set_at(), set_if().

lionel- commented 3 years ago

Another variant to consider:

x |> set_across(starts_with("foo"), NA)
njtierney commented 3 years ago

In case it is useful, this is how I've named the functions in naniar for replacing values with NA

http://naniar.njtierney.com/articles/replace-with-na.html#notes-on-alternative-ways-to-handle-replacing-with-nas

lionel- commented 3 years ago

replace_at() defined in this way would behave differently than the dplyr and purrr functions with the same suffix.

To be clear, I think it might be a good idea to gather all these index semantics in a single function. This would be consistent with the move to across() in dplyr. In general the overloading of [ is an important part of the vector interface in R and I no longer think it's important to make explicit the kind of selection used at the call site (which is often clear from the code anyway).

I'm just worried about reusing _at in a different way than in purrr and the superseded dplyr functions. Maybe we don't need a suffix, e.g.

replace <- function(x, set, value) { ... }  # set: Set of values
set <- function(x, where, value) { ... }    # where: Locations, names, logicals, predicate

set(x, is.na, "foo")
set(x, x == "foo", NA)
romainfrancois commented 3 years ago

Further playing with the idea of "replacing many things" here https://github.com/tidyverse/funs/pull/66

library(magrittr)
library(funs, warn.conflicts = FALSE)

alphabet <- c(letters[1:10], NA)
alphabet %>% 
  patch(
    when(c("a", "e", "i", "o", "u"), "vowel"),
    when(NA                        , "missing"), 
    when(default                   , "consonent")
  )
#>  [1] "vowel" "b"     "c"     "d"     "vowel" "f"     "g"     "h"     "vowel"
#> [10] "j"     NA

x <- 1:10
x %>% 
  patch(
    when(~.x < 3   , 3), 
    when(~. > 7    , 7), 
    when(c(4, 5, 6), NA)
  )
#>  [1]  3  3  3 NA NA NA  7  7  7  7

x %>% 
  patch(
    when(x < 3     , 3), 
    when(x > 7     , 7), 
    when(c(4, 5, 6), NA)
  )
#>  [1]  3  3  3 NA NA NA  7  7  7  7

Created on 2021-05-07 by the reprex package (v2.0.0)

romainfrancois commented 3 years ago

na_if() then is:

library(funs, warn.conflicts = FALSE)

x <- 1:10
na_if  <- function(x, what) {
  patch(x, when(what, NA))
}
na_if(x, x == 2)
#>  [1]  1 NA  3  4  5  6  7  8  9 10

Created on 2021-05-07 by the reprex package (v2.0.0)

DavisVaughan commented 3 years ago

This feels highly related to https://twitter.com/antoine_fabri/status/1392127389195452416, which I have wanted a better solution to for a while now. The key here is that with is allowed to be vectorized with the same length as x, not the same length as which(where), which is why base::replace() wouldn't work.

library(dplyr)
library(vctrs)

replace_if <- function(x, where, with) {
  x_size <- vec_size(x)

  vec_assert(where, ptype = logical(), size = x_size, arg = "where")

  with <- vec_recycle(with, x_size, x_arg = "with")
  with <- vec_cast(with, x, x_arg = "with", to_arg = "x")

  with <- vec_slice(with, where)

  vec_assign(x, where, with, x_arg = "x", value_arg = "with")
}

band_instruments %>%
  mutate(
    name = replace_if(name, plays == "guitar", paste0(name, "!")),
    plays2 = replace_if(plays, plays == "bass", NA)
  )
#> # A tibble: 3 x 3
#>   name   plays  plays2
#>   <chr>  <chr>  <chr> 
#> 1 John!  guitar guitar
#> 2 Paul   bass   <NA>  
#> 3 Keith! guitar guitar
romainfrancois commented 3 years ago

Still from #66 and its proposed patch(when()) syntax, allowing to replace multiple things:

library(dplyr, warn.conflicts = FALSE)
library(funs, warn.conflicts = FALSE)

band_instruments %>%
  mutate(
    name = patch(name, 
      when(plays == "guitar", paste0(name, "!")), 
      when(plays == "bass", paste0(name, "@"))
    )
  )
#> # A tibble: 3 x 2
#>   name   plays 
#>   <chr>  <chr> 
#> 1 John!  guitar
#> 2 Paul@  bass  
#> 3 Keith! guitar

Created on 2021-05-17 by the reprex package (v2.0.0)

Using when() here, or something else gives us the patch(...) so that we can replace multiple things, and when(what=, with=) instead of a formula as in case_when() allows the use of formula for what= and (maybe but not yet) with=