Closed wjhopper closed 1 year ago
Thanks for reaching out and trying the package. I had not implemented this and have generally taken the approach that the sample sizes for the two groups would be equal in proportion, which is typically found in practice from my experience.
I've had multiple people ask for this, so I implemented a new argument for factor attribute simulation, force_equal = TRUE
. The default is FALSE, not to adjust old code. This would be the new code using the example you posted from the vignette.
library(simglm)
set.seed(321)
sim_arguments <- list(
formula = y ~ 1 + weight + age + sex,
fixed = list(weight = list(var_type = 'continuous', mean = 180, sd = 30),
age = list(var_type = 'ordinal', levels = 30:60),
sex = list(var_type = 'factor', levels = c('male', 'female'),
force_equal = TRUE #this is the new code argument
)),
sample_size = 10
)
simulate_fixed(data = NULL, sim_arguments)
Wonderful, thanks for adding this functionality! Might I suggest updating the vignettes to make users aware of this new functionality? I've prepared a pull request (#107) to do this if you think it's a good addition.
I suppose this is less of an issue than a question, but is it possible to simulate data from a balanced design? For instance, once of the examples in the Tidy Simulation with simglm vignette shows how to simulate a binary categorical variable for sex, but the simulated data ends up with 8 observations in the female category, and 2 observations in the male category.
Is it possible to force the simulated data to have 5 observations in the female category, and 5 observations in the male category?