njtierney / naniar

Tidy data structures, summaries, and visualisations for missing data
http://naniar.njtierney.com/
Other
649 stars 54 forks source link

`imputed` as a basic method #345

Open njtierney opened 4 months ago

njtierney commented 4 months ago

This builds off of https://github.com/njtierney/naniar/issues/197 - it's roughly the same idea, but I think it's useful to express imputed as it's own method and then build components off of that.

The general idea would be to build a basic vector level function / S3 methods to identify whether a value has been imputed.

These could be done for just regular numeric/integer/factor/classes. This could then be done for shade class and co

library(naniar)

vec <- 1:5
vec[c(2, 4)] <- NA
vec
#> [1]  1 NA  3 NA  5

vec_imputed <- impute_fixed(vec, 0)
vec_imputed
#> [1] 1 0 3 0 5

imputed <- function(x, ...){
  UseMethod("imputed")
}

is.na(vec) == !is.na(vec_imputed)
#> [1] FALSE  TRUE FALSE  TRUE FALSE

imputed.numeric <- function(x, y, ...){
  x_na <- is.na(x)
  y_complete <- !is.na(y)
  which_imputed <- x_na == y_complete
  which_imputed
}

imputed(vec, vec_imputed)
#> [1] FALSE  TRUE FALSE  TRUE FALSE

# what if only one number is imputed?

vec2 <- 1:3
vec2[1:2] <- NA
vec2
#> [1] NA NA  3

vec2_imputed <- vec2
vec2_imputed[1] <- 0
vec2_imputed
#> [1]  0 NA  3

imputed(vec2, vec2_imputed)
#> [1]  TRUE FALSE FALSE

Created on 2024-03-10 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.3 (2024-02-29) #> os macOS Sonoma 14.3.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Hobart #> date 2024-03-10 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.1) #> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> ggplot2 3.5.0 2024-02-23 [1] CRAN (R 4.3.1) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) #> naniar * 1.1.0 2024-03-04 [1] local #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [2] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [2] CRAN (R 4.3.0) #> R.oo 1.26.0 2024-01-24 [2] CRAN (R 4.3.1) #> R.utils 2.12.3 2023-11-18 [2] CRAN (R 4.3.1) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> reprex 2.1.0 2024-01-11 [2] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [2] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> visdat 0.6.0 2023-02-02 [2] CRAN (R 4.3.0) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.1) #> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> #> [1] /Users/nick/Library/R/arm64/4.3/library #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

Note that imputed will also need to do double dispatch, to account for shade being a possible second class so I'll probably need to wrap shade up properly in a vctrs class: https://vctrs.r-lib.org/articles/s3-vector.html as well as implement imputed as a new class.

Overall this adds more complexity to the package but should help users identify imputations and do more with them. Hopefully!