njtierney / mputr

Package for handling multiple imputations in a tidy format
Other
13 stars 0 forks source link

store multiple imputations in a nested list? #7

Open njtierney opened 7 years ago

njtierney commented 7 years ago

Need to work out how to link the shadow matrix to this, but I haven't seen anyone store multiple imputations as a nested list, which seems like a really natural way to store the data, rather than as some sort of messy list structure.

One approach, taken from neato::imputation_plot

library(mice)
library(tidyverse)
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> complete(): tidyr, mice
#> filter():   dplyr, stats
#> is_null():  purrr, testthat
#> lag():      dplyr, stats
#> matches():  dplyr, testthat

imp <- mice(nhanes)
#> 
#>  iter imp variable
#>   1   1  bmi  hyp  chl
#>   1   2  bmi  hyp  chl
#>   1   3  bmi  hyp  chl
#>   1   4  bmi  hyp  chl
#>   1   5  bmi  hyp  chl
#>   2   1  bmi  hyp  chl
#>   2   2  bmi  hyp  chl
#>   2   3  bmi  hyp  chl
#>   2   4  bmi  hyp  chl
#>   2   5  bmi  hyp  chl
#>   3   1  bmi  hyp  chl
#>   3   2  bmi  hyp  chl
#>   3   3  bmi  hyp  chl
#>   3   4  bmi  hyp  chl
#>   3   5  bmi  hyp  chl
#>   4   1  bmi  hyp  chl
#>   4   2  bmi  hyp  chl
#>   4   3  bmi  hyp  chl
#>   4   4  bmi  hyp  chl
#>   4   5  bmi  hyp  chl
#>   5   1  bmi  hyp  chl
#>   5   2  bmi  hyp  chl
#>   5   3  bmi  hyp  chl
#>   5   4  bmi  hyp  chl
#>   5   5  bmi  hyp  chl

# get the number of imputations used and store

vars <- c("age", "bmi")
m_imp <- imp$m

# make a list to contain all of the imputed dataframes (m times)
dat.mi.list <- list("vector", m_imp)

# now, go through 1...m times and do the following
for (i in (1:m_imp)){

  # set the data to be
  dat.mi.list[[i]] <-
    # the i-th completed dataset from multiple imputation
    complete(imp, i) %>%
    # then subset the data based upon the variables specified
    dplyr::select(dplyr::one_of(vars)) %>%
    # then make a column called `m`, and make this a factor
    dplyr::mutate(m = as.factor(i))

}

# length(dat.mi.list)

data.imputed.melt <- do.call("rbind", dat.mi.list)   

bound_imputed <- dat.mi.list %>%
  dplyr::bind_rows() %>%
  tibble::as_tibble() %>%
  dplyr::group_by(m) %>%
  tidyr::nest()
#> Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character

bound_imputed 
#> # A tibble: 5 × 2
#>       m              data
#>   <chr>            <list>
#> 1     1 <tibble [25 × 2]>
#> 2     2 <tibble [25 × 2]>
#> 3     3 <tibble [25 × 2]>
#> 4     4 <tibble [25 × 2]>
#> 5     5 <tibble [25 × 2]>

aside from removing the loop to do mice::complete, there needs to be some sort of way to link these data back to nhanes, to help us identify which pieces were imputed.

njtierney commented 6 years ago

Great posts by @andrewheiss are related to this idea: