njtierney / naniar

Tidy data structures, summaries, and visualisations for missing data
http://naniar.njtierney.com/
Other
650 stars 54 forks source link

```replace_with_na_all``` turns factors into integers. #287

Closed themichjam closed 1 year ago

themichjam commented 3 years ago

When I try to use replace_with_na_all to clean up some factors, the resulting columns are turned into integers. See below for a simple example with Iris - it would be wonderful if that could be fixed? Obviously, I could convert factors to characters and back, but that defeats the purpose of this package a little bit.

#Fails
iris %>% naniar::replace_with_na_all(condition = ~.x %in% na_strings)
njtierney commented 3 years ago

Hi there!

Thanks for reporting this - you're right, this is a pain and not what we want naniar to do!

I'll take a look at this when I'm doing the next release, which should be by the end of August.

Cheers!

themichjam commented 3 years ago

That would be amazing, thank you!

themichjam commented 2 years ago

Tried to create a pull request for this, but thought putting here would also help others looking! Below code uses NHANES dataset as example, but latter part of the code seems to do the job of replace_all_na_all without turning factors into integers. I was thinking this could help in the update?

install.packages("NHANES")
library(reprex)
library(NHANES)
library(dplyr)

# make a selection
nhanes_long <- NHANES %>% select(Age,AgeDecade,Education,Poverty,Work,LittleInterest,Depressed,BMI,Pulse,BPSysAve,BPDiaAve,DaysPhysHlthBad,PhysActiveDays)

# select 500 random indices
rand_ind <- sample(1:nrow(nhanes_long),500)
nhanes <- nhanes_long[rand_ind,]

summary(nhanes_long)

# convert unwanted levels to NA
# write out all the offending strings of different NAs
#used
na_strings <- c("None",
                "Some College",
                "Several")

# before replacement
table(nhanes$Education)

# replace unwanted answers/typos with NA
nhanes <- nhanes %>%
  mutate(across(everything(), 
                ~ replace(., . %in% c(na_strings), NA_character_))) %>% 
  type.convert(as.is = TRUE)
njtierney commented 1 year ago

This no longer seems to be an issue due to internal changes in naniar :)

#Fails
na_strings <- "setosa"
iris %>% naniar::replace_with_na_all(condition = ~.x %in% na_strings)
#> Error in iris %>% naniar::replace_with_na_all(condition = ~.x %in% na_strings): could not find function "%>%"

Created on 2023-04-10 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.3 (2023-03-15) #> os macOS Ventura 13.2 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Hobart #> date 2023-04-10 #> pandoc 2.19.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.0) #> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.0) #> evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) #> fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.0) #> knitr 1.42 2023-01-25 [1] CRAN (R 4.2.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0) #> rlang 1.1.0 2023-03-14 [1] CRAN (R 4.2.0) #> rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.0) #> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> styler 1.9.0 2023-01-15 [1] CRAN (R 4.2.0) #> vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.37 2023-01-31 [1] CRAN (R 4.2.0) #> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```