mshafieek / ADS-Missing-data-social-network

ADS master thesis
MIT License
0 stars 1 forks source link

Issue averaging results #12

Closed JesseJvdw closed 1 year ago

JesseJvdw commented 1 year ago

When averaging the simulations with more than one simulation it gives me weird values. When following Gerko's guide to simulations I get different values than I am supposed to get.

Code below is the data amputation and running the multiple simulations.

con <- url("https://github.com/mshafieek/ADS-Missing-data-social-network/raw/main/literature_%20REM/Tutorial_REM_REH_DATA/UUsummerschool.Rdata")
load(con)
apollo <- as_tibble(PartOfApollo_13)

# use the custom pmm method
method <- make.method(apollo)

method[c(2,3)] <- "pmm.conditional"

#### set with sufficient actors & dyads
set.seed(123) # fix seed to realize a sufficient set

indic <- sample(1:nrow(apollo), 1500)

remify(apollo[indic, ], model = "tie") %>% dim() 

#### Combine the sufficient set and the incomplete set
make_missing <- function(x, indic){
  sufficient <- x[indic, ]
  miss <- x[-c(indic), ] |>
    ampute(prop = .2, 
           mech = "MCAR",
           patterns = matrix(c(1,0,1,
                               1,1,0), 
                             nrow=2, 
                             byrow=TRUE)) %>% 
    .$amp
  combined <- rbind(sufficient, 
                    miss)
  return(combined[order(combined$time), ]) # sort it all like apollo
}

##### Missing pattern
pattern <- matrix(c(1,0,1,1,1,0), nrow=2, byrow=TRUE)

##### predictor matrix
predictormatrix <- matrix(c(0,0,0,0,0,1,0,1,0), nrow=3, byrow=TRUE)

##### Model-based finite populations
mbased_finite_apollo <- list( 
  MCAR = furrr::future_map(1:1000, ~ {
    make_missing(apollo, indic) %>%
      mice(m = 5, 
           maxit = 5,
           method = method,
           whichcolumn = whichcol,
           predictorMatrix = predictormatrix,
           print = FALSE)
  }, .options = furrr_options(seed = 123)))

Then after running the cox model on all simulations I use the same pipe to average the results as in Gerko's guide.

Results %>%
  map(~.x %>%
        map(~.x %>%
          .$pooled %>% # extract table of pooled coefficients
          mutate(true = True$coefficient, # add true
                 df = m-1,  # correct df
                 riv = Inf, # correct riv
                 std.error = sqrt(t), # standard error
                 statistic = estimate / std.error, # test statistic
                 p.value = 2 * (pt(abs(statistic), 
                                   pmax(df, 0.001), 
                                   lower.tail = FALSE)), # correct p.value
                 `2.5 %` = estimate - qt(.975, df) * std.error, # lower bound CI
                 `97.5 %` = estimate + qt(.975, df) * std.error, # upper bound CI
                 cov = `2.5 %` < true & true < `97.5 %`, # coverage
                 bias = estimate - true) %>% # bias
          select(term, m, true, estimate, std.error, statistic, p.value, 
                 riv, `2.5 %`, `97.5 %`, cov, bias) %>% 
          column_to_rownames("term") # `term` as rownames
    ) %>% 
      Reduce("+", .) / length(mbased_finite_apollo)
  )

giving me the following : image

Which, to me, doesn't look like it's supposed to. Since in the example of Gerko is gives an m of 5 and here it gives 1000(sims) x 5(m) = 5000. Which is weird because I divide everything by the length of the list of simulations. Everything should be 1000 times smaller.

gerkovink commented 1 year ago

Your first listed dimension is MCAR. So that is length 1. Dividing by length(mbased_finite_apollo)[[1]]] should do the trick

JesseJvdw commented 1 year ago

that makes sense and it works. Thanks