Closed l-d-s closed 6 years ago
Can you provide an example of what you expect ? I am curious and I don't see it straightforward. Thank you.
Sometimes one wants to replace NAs in a single variable.
df %>%
mutate(y = replace_na(x, 0))
Maybe such a command would properly belong in dplyr
.
Alternatively, using NSE one might have
df %>%
replace_na(x = 0, y = "unknown")
or similar instead of
df %>%
replace_na(list(x = 0, y = "unknown"))
though maybe there are tradeoffs here I'm not aware of.
For a single variable, maybe dplyr::coalesce()
?
dplyr::coalesce(c(1, 2, NA), 0)
#> [1] 1 2 0
We could come up with a new replace function that understant non-list argument using tidyeval
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data_frame(A = c(1L, NA_integer_, 3L, NA_integer_, 5L),
b = c("a", "b", NA_character_, "c", NA_character_),
c = c(NA, 3, 3.5, NA, 87)
)
tidyr::replace_na
as a ...
arg already available. it could be use for your use case.
replace_na_new <- function(data, replace = list(), ...){
stopifnot(rlang::is_list(replace))
replace <- rlang::modify(replace, ...)
for (var in names(replace)) {
data[[var]][rlang::are_na(data[[var]])] <- replace[[var]]
}
data
}
I use rlang::modify
for testing but I think it is not a good choice.
Here are some example then:
# just replace for one column
df %>%
replace_na_new(A = 99999L)
#> # A tibble: 5 x 3
#> A b c
#> <int> <chr> <dbl>
#> 1 1 a NA
#> 2 99999 b 3.0
#> 3 3 <NA> 3.5
#> 4 99999 c NA
#> 5 5 <NA> 87.0
# replace two columns
df %>%
replace_na_new(A = 99999L, b = "unknown")
#> # A tibble: 5 x 3
#> A b c
#> <int> <chr> <dbl>
#> 1 1 a NA
#> 2 99999 b 3.0
#> 3 3 unknown 3.5
#> 4 99999 c NA
#> 5 5 unknown 87.0
# replace three columns
df %>%
replace_na_new(replace = list(A = 99999L), b = "unknown", c = 0)
#> # A tibble: 5 x 3
#> A b c
#> <int> <chr> <dbl>
#> 1 1 a 0.0
#> 2 99999 b 3.0
#> 3 3 unknown 3.5
#> 4 99999 c 0.0
#> 5 5 unknown 87.0
Here the principle if args is given more than once, second replace the first.
df %>%
replace_na_new(replace = list(A = 99999L), b = "unknown", A = 0L)
#> # A tibble: 5 x 3
#> A b c
#> <int> <chr> <dbl>
#> 1 1 a NA
#> 2 0 b 3.0
#> 3 3 unknown 3.5
#> 4 0 c NA
#> 5 5 unknown 87.0
Probably we should choose another way with more checking. I use rlang::modify
by simplicity for the example.
Moreover it is no more working with df %>% replace_na(df)
: so not the definitive solution for sure.
You also could include this type of function in your script or package using quosure on top of tidyr::replace_na
. For example, using only non-list arg
replace_na_new <- function(data, ...){
replace_dots <- rlang::dots_list(...)
tidyr::replace_na(data, replace = replace_dots)
}
df %>%
replace_na_new(A = 99999L, b = "unknown")
#> # A tibble: 5 x 3
#> A b c
#> <int> <chr> <dbl>
#> 1 1 a NA
#> 2 99999 b 3.0
#> 3 3 unknown 3.5
#> 4 99999 c NA
#> 5 5 unknown 87.0
I could try working on a PR for a more durable solution and based on this idea but I don't know the plan from the team on this one. And rlang
is not easy to dig into easily.
will see what the tidyverse team have to say about this idea.
I'm not sure what will become of the vctrs package/repo, but its issues are currently a parking place for many to do's, one of which is basically this issue:
@jennybc Great. Makes sense that it be dealt with in vctrs
then. Thanks!
Suggestion:
replace_na
should be column-wise, have a column-wise mode, or exist in a column-wise version somewhere in the tidyverse.