njtierney / naniar

Tidy data structures, summaries, and visualisations for missing data
http://naniar.njtierney.com/
Other
650 stars 54 forks source link

consider add_label_imputation #197

Open njtierney opened 6 years ago

njtierney commented 6 years ago

This would take nabular data and then add a label that would identify if there were any imputed values in a case.

There might also be some scope for something that changes the shadow values to include an "imputed" label - NA_imputed.

njtierney commented 4 years ago

Rough implementation of how this would work below.

library(tidyverse)
library(naniar)
library(simputation)
#> 
#> Attaching package: 'simputation'
#> The following object is masked from 'package:naniar':
#> 
#>     impute_median

df <- data.frame(
  x = c(1, NA, NA, 3, 4),
  y = c(2, NA, 4, 5, 6)
)

df
#>    x  y
#> 1  1  2
#> 2 NA NA
#> 3 NA  4
#> 4  3  5
#> 5  4  6

# implementation of `imputed` to find imputed values
df_imp <- df %>% 
  nabular() %>% 
  impute_lm(x ~ y)

df_imp$x
#> [1] 1.000000       NA 2.423077 3.000000 4.000000
shade(df_imp$x)
#> [1] !NA NA  !NA !NA !NA
#> Levels: !NA NA
df_imp$x_NA
#> [1] !NA NA  NA  !NA !NA
#> Levels: !NA NA
shade(df_imp$x) != df_imp$x_NA
#> [1] FALSE FALSE  TRUE FALSE FALSE

imputed <- function(data, shadow){
  shade(data) != shadow
}

df_imp %>% 
  mutate(x_imp = imputed(x, x_NA))
#> # A tibble: 5 x 5
#>       x     y x_NA  y_NA  x_imp
#>   <dbl> <dbl> <fct> <fct> <lgl>
#> 1  1        2 !NA   !NA   FALSE
#> 2 NA       NA NA    NA    FALSE
#> 3  2.42     4 NA    !NA   TRUE 
#> 4  3        5 !NA   !NA   FALSE
#> 5  4        6 !NA   !NA   FALSE

# would be great to integrate this with `recode_shadow()`
# so that you could recode missing values to be either
# NA (missing)
# !NA (not missing)
# NA_imputed (imputed value)

Created on 2020-01-08 by the reprex package (v0.3.0)

I think scoping this further, it should: