mshafieek / ADS-Missing-data-social-network

ADS master thesis
MIT License
0 stars 1 forks source link

Imputing sender / receiver with `mice` #4

Closed gerkovink closed 1 year ago

gerkovink commented 1 year ago

Hi all,

@thomvolker has put some energy into writing a new matcher in c++ for predictive mean matching. You can install the custom version of mice with the new functionality as

devtools::install_github("gerkovink/mice@match_conditional")

and then run mice as follows to correctly impute the $N$ senders with $N-1$ receivers.

library(mice, warn.conflicts = FALSE)
library(dplyr)
set.seed(123)

# create mock data
time <- rnorm(1:10) 
sender   <- c(1, 1, NA, NA,  2,  3, 1, NA, NA, 2) |> as.integer()
receiver <- c(2, 3,  2, NA, NA, NA, 2,  1,  NA, 1) |> as.integer()

# create tibble
df <- tibble::tibble(time, sender, receiver)

# determine which column to condition on
whichcol <- c("", "receiver", "sender")
names(whichcol) <- colnames(df)

# use the custom pmm method
method <- make.method(df)
method[c(2,3)] <- "pmm.conditional"

# impute
imp <- mice(df, 
            m = 5, 
            method = method, 
            whichcolumn = whichcol, 
            print = FALSE)

# check if it is correct
imp |>
  complete("long") |>
  summarize(all(sender != receiver))
#>   all(sender != receiver)
#> 1                    TRUE

Created on 2023-05-15 with reprex v2.0.2

Tagging @JesseJvdw @vira-dvoriak @LisanneLageweg

JesseJvdw commented 1 year ago

When installing I get the error: Error: Failed to install 'mice' from GitHub: ! System command 'Rcmd.exe' failed

thomvolker commented 1 year ago

Is R giving any other messages on why compilation fails, @JesseJvdw?

JesseJvdw commented 1 year ago

Nevermind, I deinstalled mice first and updated some packages now it's working.

JesseJvdw commented 1 year ago

I'm trying to use it like this:

mbased_finite_apollo <- list( MCAR = apollo %>% ampute(prop = .5, mech = "MCAR", patterns = pattern) %>% .$amp %>% mice(m = 5, maxit = 5, method = method, whichcolumn = whichcol, predictorMatrix = predictormatrix, print = F))

But then it says: Error in x[wy, whichcolumn[j]] : subscript out of bounds.

thomvolker commented 1 year ago

Can you make a reprex?

Based on the error message, it is likely that either the character strings in whichcol do not correspond to the variable names in the data, that the value names of the whichcol-vector do not correspond to the variable names of the data, or that the length of the whichcol vector is shorter than the number of variables in the data. If you ruled out all these issues, please upload a reproducible example (reprex).

JesseJvdw commented 1 year ago

I'm a total noob with reprex but here it is I guess

##### Load data
load("UUsummerschool.rdata")
#> Warning in readChar(con, 5L, useBytes = TRUE): cannot open compressed file
#> 'UUsummerschool.rdata', probable reason 'No such file or directory'
#> Error in readChar(con, 5L, useBytes = TRUE): cannot open the connection

apollo <- as_tibble(PartOfApollo_13)
#> Error in as_tibble(PartOfApollo_13): could not find function "as_tibble"

##### renaming columns to work with remify
setnames(apollo, old = c('sender','receiver'), 
         new = c('actor1','actor2'))
#> Error in setnames(apollo, old = c("sender", "receiver"), new = c("actor1", : could not find function "setnames"

whichcol <- c("", "actor1", "actor2")
names(whichcol) <- colnames(apollo)
#> Error in is.data.frame(x): object 'apollo' not found

# use the custom pmm method
method <- make.method(apollo)
#> Error in make.method(apollo): could not find function "make.method"
method[c(2,3)] <- "pmm.conditional"
#> Error in method[c(2, 3)] <- "pmm.conditional": object 'method' not found

###### Creating missing data

## Missing pattern
pattern <- matrix(c(1,0,1,1,1,0), nrow=2, byrow=TRUE)

## predictor matrix
predictormatrix <- matrix(c(0,0,0,0,0,1,0,1,0), nrow=3, byrow=TRUE)

## Model-based finite populations
mbased_finite_apollo <- list(
    apollo %>% 
      ampute(prop = .5, 
             mech = "MCAR",
             patterns = pattern) %>% .$amp %>%
      mice(m = 5, 
           maxit = 5,
           method = method,
           whichcolumn = whichcol,
           predictorMatrix = predictormatrix,
           print = FALSE))
#> Error in apollo %>% ampute(prop = 0.5, mech = "MCAR", patterns = pattern) %>% : could not find function "%>%"
thomvolker commented 1 year ago

Change whichcol <- c("", "actor1", "actor2") in your code with whichcol <- c("", "actor2", "actor1"). I think that should do the trick.

JesseJvdw commented 1 year ago

Yes it did. Thank you.

JesseJvdw commented 1 year ago

Okay I don't know why but it didn't give me an error when I changed it but now it does so again... Now I used the example from gerko but I did the imputation using my method. It still gives me the out of bounds error.

predictormatrix <- matrix(c(0,0,0,0,0,1,0,1,0), nrow=3, byrow=TRUE)
time <- rnorm(1:10) 
sender   <- c(1, 1, NA, NA,  2,  3, 1, NA, NA, 2) %>% as.integer()
#> Error in c(1, 1, NA, NA, 2, 3, 1, NA, NA, 2) %>% as.integer(): could not find function "%>%"
receiver <- c(2, 3,  2, NA, NA, NA, 2,  1,  NA, 1) %>% as.integer()
#> Error in c(2, 3, 2, NA, NA, NA, 2, 1, NA, 1) %>% as.integer(): could not find function "%>%"

# create tibble
df <- tibble::tibble(time, sender, receiver)
#> Error in eval_tidy(xs[[j]], mask): object 'sender' not found

# determine which column to condition on
whichcol <- c("", "receiver", "sender")
names(whichcol) <- colnames(df)

# use the custom pmm method
method <- make.method(df)
#> Error in make.method(df): could not find function "make.method"
method[c(2,3)] <- "pmm.conditional"
#> Error in method[c(2, 3)] <- "pmm.conditional": object 'method' not found

mbased_finite_apollo <- df %>%
     mice(m = 5, 
           maxit = 5,
           method = method,
           whichcolumn = whichcol,
           predictorMatrix = predictormatrix,
           print = FALSE)
#> Error in df %>% mice(m = 5, maxit = 5, method = method, whichcolumn = whichcol, : could not find function "%>%"

Created on 2023-05-16 with reprex v2.0.2

thomvolker commented 1 year ago

You probably didn't load didn't load all the required packages. Try loading dplyr or tibble first.

JesseJvdw commented 1 year ago

Still gives me the error after reloading all the packages. everything works fine up until

mbased_finite_apollo <- df %>%
     mice(m = 5, 
           maxit = 5,
           method = method,
           whichcolumn = whichcol,
           predictorMatrix = predictormatrix,
           print = FALSE)
#> Error in df %>% mice(m = 5, maxit = 5, method = method, whichcolumn = whichcol, : could not find function "%>%"

Created on 2023-05-16 with reprex v2.0.2

thomvolker commented 1 year ago

In your reprex you don't load any packages, so I can not evaluate what error you are getting, let alone how to solve the issue.

JesseJvdw commented 1 year ago

This is with all the packages I am using in my entire rmd

library(mice)     # for imputation and amputation
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
library(purrr)    # for functional programming
#> Warning: package 'purrr' was built under R version 4.1.3
library(furrr)    # for functional futures
#> Warning: package 'furrr' was built under R version 4.1.3
#> Loading required package: future
#> Warning: package 'future' was built under R version 4.1.3
library(magrittr) # for pipes
#> Warning: package 'magrittr' was built under R version 4.1.3
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
library(dplyr)    # for data manipulation
#> Warning: package 'dplyr' was built under R version 4.1.3
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)   # for tibbles
#> Warning: package 'tibble' was built under R version 4.1.3
library(remstats) # for REM statistics
library(remify)
library(data.table)
#> Warning: package 'data.table' was built under R version 4.1.3
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
#> The following object is masked from 'package:purrr':
#> 
#>     transpose
library(remstimate) # for   REM
library(reprex)
#> Warning: package 'reprex' was built under R version 4.1.3
predictormatrix <- matrix(c(0,0,0,0,0,1,0,1,0), nrow=3, byrow=TRUE)
time <- rnorm(1:10) 
sender   <- c(1, 1, NA, NA,  2,  3, 1, NA, NA, 2) |> as.integer()
receiver <- c(2, 3,  2, NA, NA, NA, 2,  1,  NA, 1) |> as.integer()

# create tibble
df <- tibble::tibble(time, sender, receiver)

# determine which column to condition on
whichcol <- c("", "receiver", "sender")
names(whichcol) <- colnames(df)

# use the custom pmm method
method <- make.method(df)
method[c(2,3)] <- "pmm.conditional"

mbased_finite_apollo <- df %>%
     mice(m = 5, 
           maxit = 5,
           method = method,
           whichcolumn = whichcol,
           predictorMatrix = predictormatrix,
           print = FALSE)
#> Error in x[wy, whichcolumn[j]]: subscript out of bounds

Created on 2023-05-16 with reprex v2.0.2

thomvolker commented 1 year ago

I see, the specification of the predictormatrix implicitly transforms the matrix x in mice() to a vector. I'll update the code of mice.impute.pmm.conditional.

thomvolker commented 1 year ago

I added a fix, but @gerkovink have to merge before you can install mice from his repository. However, you could also install mice from my development branch, just run

devtools::install_github("thomvolker/mice@mice-gerko")

and then run your code again.

gerkovink commented 1 year ago

Merged. So you can install from either fork of mice

gerkovink commented 1 year ago

Don't know if people are still struggling, but the reprex below illustrates that this works on the PartOfApollo_13 data set:

# load packages
library(mice)     # imputation
library(magrittr) # pipes
library(purrr)    # functional programming

# load data
connection <- url("https://github.com/mshafieek/ADS-Missing-data-social-network/blob/main/literature_%20REM/Tutorial_REM_REH_DATA/UUsummerschool.Rdata?raw=true")
load(connection)

# make data incomplete
apollo_mis <- 
  PartOfApollo_13 |> 
  ampute(prop = .5, 
         patterns = matrix(c(1, 0, 0,    # both sender/receiver missing
                             1, 1, 0,    # only receiver missing
                             1, 0, 1),   # only sender missing
                           nrow = 3, 
                           ncol = 3,
                           byrow = TRUE)) |> 
  pluck("amp")

# check missing data patterns
apollo_mis |>
  md.pattern()

#>      time receiver sender     
#> 1858    1        1      1    0
#> 710     1        1      0    1
#> 631     1        0      1    1
#> 683     1        0      0    2
#>         0     1314   1393 2707

# set up conditional arguments for pmm
whichcol <- c("", "receiver", "sender")
names(whichcol) <- colnames(apollo_mis)

# set imputation methods
method <- make.method(apollo_mis)
method[c("sender", "receiver")] <- "pmm.conditional"

# impute
imp <- 
  apollo_mis |>
  mice(m = 5, 
       method = method, 
       whichcolumn = whichcol)
#> 
#>  iter imp variable
#>   1   1  sender  receiver
#>   1   2  sender  receiver
#>   1   3  sender  receiver
#>   1   4  sender  receiver
#>   1   5  sender  receiver
#>   2   1  sender  receiver
#>   2   2  sender  receiver
#>   2   3  sender  receiver
#>   2   4  sender  receiver
#>   2   5  sender  receiver
#>   3   1  sender  receiver
#>   3   2  sender  receiver
#>   3   3  sender  receiver
#>   3   4  sender  receiver
#>   3   5  sender  receiver
#>   4   1  sender  receiver
#>   4   2  sender  receiver
#>   4   3  sender  receiver
#>   4   4  sender  receiver
#>   4   5  sender  receiver
#>   5   1  sender  receiver
#>   5   2  sender  receiver
#>   5   3  sender  receiver
#>   5   4  sender  receiver
#>   5   5  sender  receiver

# check
imp |>
  complete("all") |>
  map(~.x %$% 
        sum(.x$sender == .x$receiver))
#> $`1`
#> [1] 0
#> 
#> $`2`
#> [1] 0
#> 
#> $`3`
#> [1] 0
#> 
#> $`4`
#> [1] 0
#> 
#> $`5`
#> [1] 0

Created on 2023-05-17 with reprex v2.0.2

thomvolker commented 1 year ago

Hi all, there was bug in the previous code that leads to incorrect matches. This issue is fixed as of the following pull request https://github.com/gerkovink/mice/pull/10#issue-1714124801.

Good luck with your theses!

gerkovink commented 1 year ago

Merged