lebebr01 / simglm

Simulate regression models
https://simglm.brandonlebeau.org/
Other
43 stars 12 forks source link

The `cov_param` arguments are confusing #30

Closed jknowles closed 7 years ago

jknowles commented 7 years ago

Thank you for this great package -- I think it is just what I am looking for. I have run into some snags in the documentation (as a package maintainer myself, this is something I completely understand).

I am trying to modify the example in the help file for my own purposes. The example is here:

# Longitudinal linear mixed model example
fixed <- ~1 + time + diff + act + time:act
random <- ~1 + time + diff
fixed_param <- c(4, 2, 6, 2.3, 7)
random_param <- list(random_var = c(7, 4, 2), rand_gen = 'rnorm')
cov_param <- list(dist_fun = c('rnorm', 'rnorm'), 
  var_type = c("lvl1", "lvl2"),
  opts = list(list(mean = 0, sd = 1.5), 
  list(mean = 0, sd = 4)))
n <- 150
p <- 30
error_var <- 4
with_err_gen <- 'rnorm'
data_str <- "long"
temp_long <- sim_reg(fixed, random, random3 = NULL, fixed_param, 
   random_param, random_param3 = NULL,
   cov_param, k = NULL, n, p, error_var, with_err_gen, data_str = data_str)

The way cov_param is specified is confusing to me. In the single level case, I need to specify a function to draw from and parameters for that function in a list for as many variables as there are in fixed. In the multilevel case, I seemingly only need to specify a function list for each of the two levels here. However, if I modify this example at all it often does not work.

From the example and documentation it is hard to tell which variables need to be defined in cov_param -- which of these variables are cluster variables or variables that do not need to be defined. How do I know how many lvl1 and lvl2 variables to specify in this example? Is time a reserved variable that is handled specially by the function and is excluded from user control?

lebebr01 commented 7 years ago

I agree, I have not been happy with the documentation for the cov_param argument. I think more examples are needed with better wording. I'm open to suggestions for better wording for this argument. Only two are required, dist_fun and var_type, the rest are optional arguments passed to the specific distribution function.

Quickly, the cov_param argument pertains to variables that need to be simulated from a generating function in R. This does not include time (soon time will be able to be directly specified by users). In the example above, cov_param would need to be length 2 to generate. Any factor variables are not specified with cov_param with single or multilevel models as well as interactions and the intercept.