markvanderloo / simputation

Making imputation easy
GNU General Public License v3.0
91 stars 11 forks source link

adding na_status and glimpse_na functions to peek at NA values #30

Closed edwindj closed 3 years ago

edwindj commented 3 years ago

Hey Mark,

Playing around with simputation, I thought the following functions would be handy:

library(simputation)

na_status

Data mutilation:

dat <- iris
dat[1:3,1] <- dat[3:7,2] <- dat[8:10,5] <- NA

na_status gives a quick overview of NA’s and locations in a data.frame:

na_status(dat)
## 
## na count: 11

##        columns nNA
## 1  Sepal.Width   5
## 2      Species   3
## 3 Sepal.Length   3

It is useful to check the progress of the imputation process.

dat2 <- impute_lm(dat, Sepal.Length ~ Sepal.Width + Species)
na_status(dat2)
## 
## na count: 9

##        columns nNA
## 1  Sepal.Width   5
## 2      Species   3
## 3 Sepal.Length   1

glimpse_na

When using an imputation pipeline, glimpse_na can be handy. It prints na_status but returns the original input: so it can be placed in a pipeline:

library(dplyr)
dat_imputed <- 
  dat %>% 
  glimpse_na()
## 
## na count: 11

##        columns nNA
## 1  Sepal.Width   5
## 2      Species   3
## 3 Sepal.Length   3
library(magrittr)
dat_imputed <- 
  dat %>% 
  impute_lm(Sepal.Length ~ Sepal.Width + Species) %>%
  glimpse_na()
## 
## na count: 9

##        columns nNA
## 1  Sepal.Width   5
## 2      Species   3
## 3 Sepal.Length   1

Ok , still work to do on Sepal.Length

dat_imputed <- 
  dat %>% 
  impute_lm(Sepal.Length ~ Sepal.Width + Species) %>%
  impute_median(Sepal.Length ~ Species) %>%
  glimpse_na()
## 
## na count: 8

##       columns nNA
## 1 Sepal.Width   5
## 2     Species   3

And finish it off in the next iteration:

dat_imputed <- 
  dat %>% 
  impute_lm(Sepal.Length ~ Sepal.Width + Species) %>%
  impute_median(Sepal.Length ~ Species) %>%
  impute_cart(. ~ .)  %>% 
  glimpse_na()
## 
## No NA's.

We can also peak in to imputation pipeline with %?>%, which effectively inserts a glimpse_na:

dat_imputed <- 
  dat %>% 
  impute_lm(Sepal.Length ~ Sepal.Width + Species) %?>%
  impute_median(Sepal.Length ~ Species) %>%
  impute_cart(. ~ .) %>% 
  glimpse_na()
## 
## na count: 9

##        columns nNA
## 1  Sepal.Width   5
## 2      Species   3
## 3 Sepal.Length   1

## 
## No NA's.
markvanderloo commented 3 years ago

awesombalzzz!