Power simulation unbalanced multilevel design

lebebr01 / simglm

Simulate regression models

Other

43 stars 12 forks source link

Hi again, Due to a series of unfortunate events, I was not able to follow up the last time I asked this question. Sorry for that. Again, i truly appreciate your work with this package.

To provide a bit of context of what i am trying to achieve: I have a data set consisting of a series of repeated cross sectional surveys conducted annually (10 years). Each survey consist of a unique set of individuals. The individual responses are nested in year of participation and municipality of participation, yielding the following three-level design: individuals (level 1), nested in municipality years (muni_year; level 2), nested in municipalities (municipality: level 3). Overall, the data consist of approximately 550,000 individuals, 1000 municipality years, and 400 municipalities. However, the design is unbalanced, as municipalities have participated to a varying degree (number of municipality years ranges from 1 to 4 per municipalities), and the number of participants also differs across municipality years (from about 200 to 5000).

For this study, I am trying to simulate the power of a level 2 predictor (x) on a level 1 outcome variable (y), assuming a standardized beta of 0.05. I think i have managed to correctly simulate the power assuming a balanced design (i.e equal number of participants per municipality year; equal number of municipality years pr municipalities). However, my understanding is that power also is sensitive to whether the design is balanced or not, so I am hoping to adjust the simulation to better account for this. Is there a way to adjust my code to specify, for instance, the range of number of participants per level 2 units, and range (and proportion) of municipality years (level 2) per municipalities (level 3) so that it better aligns with the data I have?

#simulation argument
sim_arguments <- list(
  formula = y ~ 1 + x + (1 | muni_year) + (1 | municipality), REML=TRUE,
  fixed = list(
    y = list(var_type = 'continuous',
                  mean=0, sd=1,
                  var_level=1),
    x = list(var_type = 'continuous',
                      mean=0, sd=1,
                      var_level=2)),
  reg_weights = c(intercept = 0, x = .05),
  error = list(variance = 1),
  randomeffect = list(var2 = list(variance = 0.002556, var_level=2),
                      var3 = list(variance = 0.011449, var_level=3)),
  replications = 10,
  model_fit=list(model_function="lmer"),
  extract_coefficients = TRUE,
  power = list(alpha = .05),
  sample_size = list(level1 = 545,
                     level2 = 3,
                     level3 = 400))
set.seed(123)

#simulate and view data
fixed_data <- simulate_fixed(data = NULL, sim_arguments)
head(fixed_data, n = 20)

#power
power <- replicate_simulation(sim_arguments) %>%
  compute_statistics(sim_arguments)
)

set.seed(5) level2_ss <- round(runif(40, min = 1, max = 4), 0) level1_ss <- round(runif(sum(level2_ss), min = 2, max = 50), 0) #simulation argument sim_arguments <- list( formula = y ~ 1 + x + (1 | muni_year) + (1 | municipality), fixed = list( y = list(var_type = 'continuous', mean=0, sd=1, var_level=1), x = list(var_type = 'continuous', mean=0, sd=1, var_level=2)), reg_weights = c(intercept = 0, x = .05), error = list(variance = 1), randomeffect = list(var2 = list(variance = 0.002556, var_level=2), var3 = list(variance = 0.011449, var_level=3)), replications = 10, model_fit=list(model_function="lmer"), extract_coefficients = TRUE, power = list(alpha = .05), sample_size = list(level1 = level1_ss, level2 = level2_ss, level3 = 40)) set.seed(123) #simulate and view data fixed_data <- simulate_fixed(data = NULL, sim_arguments) head(fixed_data, n = 20)

set.seed(5) level1_ss <- round(runif(7, min = 2, max = 50), 0) #simulation argument sim_arguments <- list( formula = y ~ 1 + x + (1 | muni_year), fixed = list( y = list(var_type = 'continuous', mean=0, sd=1, var_level=1), x = list(var_type = 'continuous', mean=0, sd=1, var_level=2)), reg_weights = c(intercept = 0, x = .05), error = list(variance = 1), randomeffect = list(var2 = list(variance = 0.002556, var_level=2)), replications = 10, model_fit=list(model_function="lmer"), extract_coefficients = TRUE, power = list(alpha = .05), sample_size = list(level1 = level1_ss, level2 = 7)) set.seed(123) #simulate and view data fixed_data <- simulate_fixed(data = NULL, sim_arguments) head(fixed_data, n = 20)

lebebr01 / simglm

Power simulation unbalanced multilevel design #105