njtierney / naniar

Tidy data structures, summaries, and visualisations for missing data
http://naniar.njtierney.com/
Other
650 stars 54 forks source link

`replace_with_na_if` destroys factors; returns integers instead #278

Closed LukasWallrich closed 1 year ago

LukasWallrich commented 3 years ago

When I try to use replace_with_na_if to clean up some factors, the resulting columns are turned into integers. See below for a simple example - it would be wonderful if that could be fixed? Obviously, I could convert factors to characters and back, but that would become rather fiddly in my actual use case.

#Works
naniar::replace_with_na(iris, list(Species="setosa"))

#Fails
iris %>% naniar::replace_with_na_if(is.factor, ~.x %in% "setosa")
LukasWallrich commented 3 years ago

The problem arises from purrr::reduce(c) in na_set ... I don't quite understand why that step is needed, but the c() function strips the levels from the factor in this case.

LukasWallrich commented 3 years ago

One more note: removing the reduce step would make the function much faster - on a 10,000 row sample dataset, after I converted factors to characters, running time drops from 8.6s with reduce to 1.4s without reduce. So apart from the factor error, it might be worth rethinking that step?

antondutoit commented 3 years ago

This is a problem with quite a few tidyverse functions. You've no doubt figured out a solution by now, but for anyone coming here in future here's a different workaround:

df <- df %>% mutate_if(purrr::negate(is.POSIXt), ~ na_if(., ""))

This one is for replacing empty cells in non-POSIXt columns, but can be adapted easily enough.

njtierney commented 1 year ago

This has been resolved to due internal changes in naniar/tidyverse code

#Works
naniar::replace_with_na(iris, list(Species="setosa"))
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
#> 1            5.1         3.5          1.4         0.2       <NA>
#> 2            4.9         3.0          1.4         0.2       <NA>
#> 3            4.7         3.2          1.3         0.2       <NA>
#> 4            4.6         3.1          1.5         0.2       <NA>
#> 5            5.0         3.6          1.4         0.2       <NA>
#> 6            5.4         3.9          1.7         0.4       <NA>
#> 7            4.6         3.4          1.4         0.3       <NA>
#> 8            5.0         3.4          1.5         0.2       <NA>
#> 9            4.4         2.9          1.4         0.2       <NA>
#> 10           4.9         3.1          1.5         0.1       <NA>
#> 11           5.4         3.7          1.5         0.2       <NA>
#> 12           4.8         3.4          1.6         0.2       <NA>
#> 13           4.8         3.0          1.4         0.1       <NA>
#> 14           4.3         3.0          1.1         0.1       <NA>
#> 15           5.8         4.0          1.2         0.2       <NA>
#> 16           5.7         4.4          1.5         0.4       <NA>
#> 17           5.4         3.9          1.3         0.4       <NA>
#> 18           5.1         3.5          1.4         0.3       <NA>
#> 19           5.7         3.8          1.7         0.3       <NA>
#> 20           5.1         3.8          1.5         0.3       <NA>
#> 21           5.4         3.4          1.7         0.2       <NA>
#> 22           5.1         3.7          1.5         0.4       <NA>
#> 23           4.6         3.6          1.0         0.2       <NA>
#> 24           5.1         3.3          1.7         0.5       <NA>
#> 25           4.8         3.4          1.9         0.2       <NA>
#> 26           5.0         3.0          1.6         0.2       <NA>
#> 27           5.0         3.4          1.6         0.4       <NA>
#> 28           5.2         3.5          1.5         0.2       <NA>
#> 29           5.2         3.4          1.4         0.2       <NA>
#> 30           4.7         3.2          1.6         0.2       <NA>
#> 31           4.8         3.1          1.6         0.2       <NA>
#> 32           5.4         3.4          1.5         0.4       <NA>
#> 33           5.2         4.1          1.5         0.1       <NA>
#> 34           5.5         4.2          1.4         0.2       <NA>
#> 35           4.9         3.1          1.5         0.2       <NA>
#> 36           5.0         3.2          1.2         0.2       <NA>
#> 37           5.5         3.5          1.3         0.2       <NA>
#> 38           4.9         3.6          1.4         0.1       <NA>
#> 39           4.4         3.0          1.3         0.2       <NA>
#> 40           5.1         3.4          1.5         0.2       <NA>
#> 41           5.0         3.5          1.3         0.3       <NA>
#> 42           4.5         2.3          1.3         0.3       <NA>
#> 43           4.4         3.2          1.3         0.2       <NA>
#> 44           5.0         3.5          1.6         0.6       <NA>
#> 45           5.1         3.8          1.9         0.4       <NA>
#> 46           4.8         3.0          1.4         0.3       <NA>
#> 47           5.1         3.8          1.6         0.2       <NA>
#> 48           4.6         3.2          1.4         0.2       <NA>
#> 49           5.3         3.7          1.5         0.2       <NA>
#> 50           5.0         3.3          1.4         0.2       <NA>
#> 51           7.0         3.2          4.7         1.4 versicolor
#> 52           6.4         3.2          4.5         1.5 versicolor
#> 53           6.9         3.1          4.9         1.5 versicolor
#> 54           5.5         2.3          4.0         1.3 versicolor
#> 55           6.5         2.8          4.6         1.5 versicolor
#> 56           5.7         2.8          4.5         1.3 versicolor
#> 57           6.3         3.3          4.7         1.6 versicolor
#> 58           4.9         2.4          3.3         1.0 versicolor
#> 59           6.6         2.9          4.6         1.3 versicolor
#> 60           5.2         2.7          3.9         1.4 versicolor
#> 61           5.0         2.0          3.5         1.0 versicolor
#> 62           5.9         3.0          4.2         1.5 versicolor
#> 63           6.0         2.2          4.0         1.0 versicolor
#> 64           6.1         2.9          4.7         1.4 versicolor
#> 65           5.6         2.9          3.6         1.3 versicolor
#> 66           6.7         3.1          4.4         1.4 versicolor
#> 67           5.6         3.0          4.5         1.5 versicolor
#> 68           5.8         2.7          4.1         1.0 versicolor
#> 69           6.2         2.2          4.5         1.5 versicolor
#> 70           5.6         2.5          3.9         1.1 versicolor
#> 71           5.9         3.2          4.8         1.8 versicolor
#> 72           6.1         2.8          4.0         1.3 versicolor
#> 73           6.3         2.5          4.9         1.5 versicolor
#> 74           6.1         2.8          4.7         1.2 versicolor
#> 75           6.4         2.9          4.3         1.3 versicolor
#> 76           6.6         3.0          4.4         1.4 versicolor
#> 77           6.8         2.8          4.8         1.4 versicolor
#> 78           6.7         3.0          5.0         1.7 versicolor
#> 79           6.0         2.9          4.5         1.5 versicolor
#> 80           5.7         2.6          3.5         1.0 versicolor
#> 81           5.5         2.4          3.8         1.1 versicolor
#> 82           5.5         2.4          3.7         1.0 versicolor
#> 83           5.8         2.7          3.9         1.2 versicolor
#> 84           6.0         2.7          5.1         1.6 versicolor
#> 85           5.4         3.0          4.5         1.5 versicolor
#> 86           6.0         3.4          4.5         1.6 versicolor
#> 87           6.7         3.1          4.7         1.5 versicolor
#> 88           6.3         2.3          4.4         1.3 versicolor
#> 89           5.6         3.0          4.1         1.3 versicolor
#> 90           5.5         2.5          4.0         1.3 versicolor
#> 91           5.5         2.6          4.4         1.2 versicolor
#> 92           6.1         3.0          4.6         1.4 versicolor
#> 93           5.8         2.6          4.0         1.2 versicolor
#> 94           5.0         2.3          3.3         1.0 versicolor
#> 95           5.6         2.7          4.2         1.3 versicolor
#> 96           5.7         3.0          4.2         1.2 versicolor
#> 97           5.7         2.9          4.2         1.3 versicolor
#> 98           6.2         2.9          4.3         1.3 versicolor
#> 99           5.1         2.5          3.0         1.1 versicolor
#> 100          5.7         2.8          4.1         1.3 versicolor
#> 101          6.3         3.3          6.0         2.5  virginica
#> 102          5.8         2.7          5.1         1.9  virginica
#> 103          7.1         3.0          5.9         2.1  virginica
#> 104          6.3         2.9          5.6         1.8  virginica
#> 105          6.5         3.0          5.8         2.2  virginica
#> 106          7.6         3.0          6.6         2.1  virginica
#> 107          4.9         2.5          4.5         1.7  virginica
#> 108          7.3         2.9          6.3         1.8  virginica
#> 109          6.7         2.5          5.8         1.8  virginica
#> 110          7.2         3.6          6.1         2.5  virginica
#> 111          6.5         3.2          5.1         2.0  virginica
#> 112          6.4         2.7          5.3         1.9  virginica
#> 113          6.8         3.0          5.5         2.1  virginica
#> 114          5.7         2.5          5.0         2.0  virginica
#> 115          5.8         2.8          5.1         2.4  virginica
#> 116          6.4         3.2          5.3         2.3  virginica
#> 117          6.5         3.0          5.5         1.8  virginica
#> 118          7.7         3.8          6.7         2.2  virginica
#> 119          7.7         2.6          6.9         2.3  virginica
#> 120          6.0         2.2          5.0         1.5  virginica
#> 121          6.9         3.2          5.7         2.3  virginica
#> 122          5.6         2.8          4.9         2.0  virginica
#> 123          7.7         2.8          6.7         2.0  virginica
#> 124          6.3         2.7          4.9         1.8  virginica
#> 125          6.7         3.3          5.7         2.1  virginica
#> 126          7.2         3.2          6.0         1.8  virginica
#> 127          6.2         2.8          4.8         1.8  virginica
#> 128          6.1         3.0          4.9         1.8  virginica
#> 129          6.4         2.8          5.6         2.1  virginica
#> 130          7.2         3.0          5.8         1.6  virginica
#> 131          7.4         2.8          6.1         1.9  virginica
#> 132          7.9         3.8          6.4         2.0  virginica
#> 133          6.4         2.8          5.6         2.2  virginica
#> 134          6.3         2.8          5.1         1.5  virginica
#> 135          6.1         2.6          5.6         1.4  virginica
#> 136          7.7         3.0          6.1         2.3  virginica
#> 137          6.3         3.4          5.6         2.4  virginica
#> 138          6.4         3.1          5.5         1.8  virginica
#> 139          6.0         3.0          4.8         1.8  virginica
#> 140          6.9         3.1          5.4         2.1  virginica
#> 141          6.7         3.1          5.6         2.4  virginica
#> 142          6.9         3.1          5.1         2.3  virginica
#> 143          5.8         2.7          5.1         1.9  virginica
#> 144          6.8         3.2          5.9         2.3  virginica
#> 145          6.7         3.3          5.7         2.5  virginica
#> 146          6.7         3.0          5.2         2.3  virginica
#> 147          6.3         2.5          5.0         1.9  virginica
#> 148          6.5         3.0          5.2         2.0  virginica
#> 149          6.2         3.4          5.4         2.3  virginica
#> 150          5.9         3.0          5.1         1.8  virginica

#Fails
iris |> naniar::replace_with_na_if(is.factor, ~.x %in% "setosa") |> 
  tibble::as_tibble()
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 <NA>   
#>  2          4.9         3            1.4         0.2 <NA>   
#>  3          4.7         3.2          1.3         0.2 <NA>   
#>  4          4.6         3.1          1.5         0.2 <NA>   
#>  5          5           3.6          1.4         0.2 <NA>   
#>  6          5.4         3.9          1.7         0.4 <NA>   
#>  7          4.6         3.4          1.4         0.3 <NA>   
#>  8          5           3.4          1.5         0.2 <NA>   
#>  9          4.4         2.9          1.4         0.2 <NA>   
#> 10          4.9         3.1          1.5         0.1 <NA>   
#> # … with 140 more rows

Created on 2023-04-10 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.3 (2023-03-15) #> os macOS Ventura 13.2 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Hobart #> date 2023-04-10 #> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.0) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.0) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.0) #> dplyr 1.1.1 2023-03-22 [1] CRAN (R 4.2.0) #> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) #> fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0) #> ggplot2 3.4.1 2023-02-10 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> gtable 0.3.1 2022-09-01 [1] CRAN (R 4.2.0) #> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.0) #> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0) #> naniar 1.0.0.9000 2023-04-10 [1] local #> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0) #> rlang 1.1.0 2023-03-14 [1] CRAN (R 4.2.0) #> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0) #> scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> styler 1.9.0 2023-01-15 [1] CRAN (R 4.2.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.2.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.0) #> vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.2.0) #> visdat 0.6.0 2023-02-02 [1] local #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.37 2023-01-31 [1] CRAN (R 4.2.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```