mshafieek / ADS-Missing-data-social-network

ADS master thesis
MIT License
0 stars 1 forks source link

Pooling results #13

Closed LisanneLageweg closed 1 year ago

LisanneLageweg commented 1 year ago

I have the code (in my opinion) working correctly up to the Cox models. When I have only one simulation, pooling works fine. However, when I have multiple simulations, it doesn't work because it's a nested list. I have tried everything but can't figure out how to solve it. I have already looked at the code from Jesse and Vira, but my code looks quite different, so I didn't gain any insights from that. I will include the code below for this part.

library(mice)     
library(purrr)    
library(furrr)    
library(mvtnorm)  
library(magrittr) 
library(dplyr)    
library(tibble)   
library(relevent)
library(Epi)
library(ggplot2)
library(sna)
library(remify)
library(RSiena)
library(rem)
library(survival)
library(remstats)
library(remify)
library(visNetwork)

load("~/Master Applied Data Science/Thesis/Thesis/UUsummerschool.Rdata")
str(PartOfApollo_13)
summary(PartOfApollo_13)
head(PartOfApollo_13)

Apollo_tibble <- as_tibble(PartOfApollo_13)

############## missing data 

mcar_simulations <- furrr::future_map(1:3, ~ {
  Apollo_tibble %>% 
    ampute(prop = .2, 
           mech = "MCAR", patterns = c(1, 1, 0)) %>%
    .$amp
}, .options = furrr_options(seed = 123))

############ imputing data

imputed_datasets <- furrr::future_map(mcar_simulations, ~ {
  mids <- mice(.x, m = 5, maxit = 5, method = "pmm", print = FALSE)
  pred <- mids$pred
  pred[ ,"time"] <- 0
  imp <- mice(.x, m = 5, maxit = 5, pred = pred, method = "pmm", print = FALSE)
  imp
}, .options = furrr_options(seed = 123))

########## REM

# Definieer een functie om de analyse uit te voeren op een enkele imputed dataset
analysis_function <- function(data) {
  # Rename columns
  data <- data %>% 
    rename(
      actor1 = sender,  # vervang 'sender' door de juiste kolomnaam
      actor2 = receiver  # vervang 'receiver' door de juiste kolomnaam
    )

  # Voer de analyse uit
  reh <- remify::remify(edgelist = data, model = "tie")
  statsObject_imp <- remstats(reh = reh, tie_effects = effects)

  # Return de resultaten
  return(statsObject_imp)
}

# Compleet de geïmputeerde datasets
completed_datasets <- lapply(imputed_datasets, function(x) mice::complete(x, "all"))

# Pas de analyse toe op elke imputed dataset
statsObject_imp <- lapply(completed_datasets, function(dataset) {
  lapply(dataset, analysis_function)
})

############## create cox data 

# Define a function that prepares data for coxph
prepare_coxph_data <- function(statsObject_imp, apollo) {
  risk_sets <- attr(statsObject_imp, "riskset")
  risk_sets <- risk_sets %>% select(-'id')

  # Get the times
  time <- apollo$time

  # merge riskset with each timepoint
  combined <- merge(risk_sets, time, by = NULL)

  combined <- combined %>% rename("time" = "y")
  combined <- lapply(combined, as.numeric)
  combined <- as.data.frame(combined)

  # Create matrices for subtraction to make a status column for coxph
  combined_matrix <- data.matrix(combined)
  matrix_rows <- nrow(combined)

  repeated_df <- apollo[rep(seq_len(nrow(apollo)), each = 240), ] 
  repeated_df <- repeated_df[, c(2,3,1)]
  apollo_matrix <- data.matrix(repeated_df)

  status_matrix <- apollo_matrix - combined_matrix

  # create a status column
  status <- as.integer(rowSums(status_matrix == 0) == ncol(status_matrix))
  status <- as.data.frame(status)

  # Add status to the combined set
  combined <- cbind(combined, status)

  # Extract statistics and add them to the dataframe
  reciprocity <- statsObject_imp[,,1]
  indegreeSender <- statsObject_imp[,,2]
  outdegreeReceiver <- statsObject_imp[,,3]

  combined$reciprocity <- c(reciprocity)
  combined$indegreeSender <- c(indegreeSender)
  combined$outdegreeReceiver <- c(outdegreeReceiver)

  return(combined)
}

prepared_data <- lapply(statsObject_imp, function(nested_list) {
  lapply(nested_list, prepare_coxph_data, apollo = PartOfApollo_13)
})

################ cox models 

coxph_models <- lapply(prepared_data, function(imputed_datasets) {
  lapply(imputed_datasets, function(df) {
    coxph(Surv(time, status) ~ reciprocity + indegreeSender + outdegreeReceiver, data = df)
  })
})
mshafieek commented 1 year ago

@gerkovink, do you have any idea, is that related to mice?

gerkovink commented 1 year ago

I'll look at it tonight.

mshafieek commented 1 year ago

@gerkovink, Since I see that the inputs look fine related to defining the cox model etc. Or probably it is related to lapply()

gerkovink commented 1 year ago

Most likely

gerkovink commented 1 year ago

@LisanneLageweg

There is no effects object in your code, hence the line

statsObject_imp <- remstats(reh = reh, tie_effects = effects)

breaks the code. See below for a more memory efficient flow with functional programming through purrr:map():

library(mice)     
library(purrr)    
library(furrr)    
library(mvtnorm)  
library(magrittr) 
library(dplyr)    
library(tibble)   
library(relevent)
library(Epi)
library(ggplot2)
library(sna)
library(remify)
library(RSiena)
library(rem)
library(survival)
library(remstats)
library(remify)
library(visNetwork)

load("UUsummerschool.Rdata")
str(PartOfApollo_13)
summary(PartOfApollo_13)
head(PartOfApollo_13)

Apollo_tibble <- as_tibble(PartOfApollo_13)

############## missing data 

mcar_simulations <- furrr::future_map(1:3, ~ {
  Apollo_tibble %>% 
    ampute(prop = .2, 
           mech = "MCAR", patterns = c(1, 1, 0)) %>%
    .$amp
}, .options = furrr_options(seed = 123))

############ imputing data

imputed_datasets <- furrr::future_map(mcar_simulations, ~ {
  mids <- mice(.x, m = 5, maxit = 5, method = "pmm", print = FALSE)
  pred <- mids$pred
  pred[ ,"time"] <- 0
  imp <- mice(.x, m = 5, maxit = 5, pred = pred, method = "pmm", print = FALSE)
  imp
}, .options = furrr_options(seed = 123))

########## REM
# Definieer effects
effects <- ~ -1 + reciprocity(scaling = ("std")) +
  indegreeSender() + outdegreeReceiver()

# Definieer een functie om de analyse uit te voeren op een enkele imputed dataset
analysis_function <- function(data) {
  # Rename columns
  data %>% 
    rename(
      actor1 = sender,  # vervang 'sender' door de juiste kolomnaam
      actor2 = receiver  # vervang 'receiver' door de juiste kolomnaam
    ) %>% 
    remify::remify(model = "tie") %>% 
    remstats(tie_effects = effects) # GV hier roep je de effects weer aan
}

############## create cox data 

# Define a function that prepares data for coxph
prepare_coxph_data <- function(statsObject_imp, apollo) {
  risk_sets <- attr(statsObject_imp, "riskset")
  risk_sets <- risk_sets %>% select(-'id')

  # Get the times
  time <- apollo$time

  # merge riskset with each timepoint
  combined <- merge(risk_sets, time, by = NULL)

  combined <- combined %>% rename("time" = "y")
  combined <- lapply(combined, as.numeric)
  combined <- as.data.frame(combined)

  # Create matrices for subtraction to make a status column for coxph
  combined_matrix <- data.matrix(combined)
  matrix_rows <- nrow(combined)

  repeated_df <- apollo[rep(seq_len(nrow(apollo)), each = 240), ] 
  repeated_df <- repeated_df[, c(2,3,1)]
  apollo_matrix <- data.matrix(repeated_df)

  status_matrix <- apollo_matrix - combined_matrix

  # create a status column
  status <- as.integer(rowSums(status_matrix == 0) == ncol(status_matrix))
  status <- as.data.frame(status)

  # Add status to the combined set
  combined <- cbind(combined, status)

  # Extract statistics and add them to the dataframe
  reciprocity <- statsObject_imp[,,1]
  indegreeSender <- statsObject_imp[,,2]
  outdegreeReceiver <- statsObject_imp[,,3]

  combined$reciprocity <- c(reciprocity)
  combined$indegreeSender <- c(indegreeSender)
  combined$outdegreeReceiver <- c(outdegreeReceiver)

  return(combined)
}

# Pas de analyse toe op elke imputed dataset
statsObject_imp <- 
  imputed_datasets %>% 
  map(~.x %>% # for every simulation
        complete("all") %>% # for every imputation
        map(~.x %>% 
              analysis_function() %>% # do analysis function
              prepare_coxph_data(apollo = PartOfApollo_13) %$% # prepare cox ph
              coxph(Surv(time, status) ~ 
                      reciprocity + 
                      indegreeSender + 
                      outdegreeReceiver))) # run cox ph

# Next step: pooling
gerkovink commented 1 year ago

See also this file I adapted for @vira-dvoriak, where more inspiration can be found.