tmatta / lsasim

Simulate large scale assessment data
6 stars 5 forks source link

Generating multinomial background variables #41

Open Sinan-Yavuz opened 3 years ago

Sinan-Yavuz commented 3 years ago

Dear lsasim team,

In the following example, I can generate correlated ordinal background variables

N <- 100
d <- matrix(data = .4, nrow = 6, ncol = 6)
diag(d) <- 1

c_prop <- list(c(1),
               c(.2,.4,.6,.8,1),
               c(.1,.3,.5,.7,1),
               c(.3,.6,1),
               c(.7,.9,1),
               c(.4,.6,.9,1))

bgr_dt <- questionnaire_gen(n_obs = N, cat_prop = c_prop, cor_matrix = d, n_vars = 6, theta = TRUE) 

However, this function doesn't allow us to generate correlated multinomial variables, such as "race".

How can we achieve this?

Thank you, Sinan

wleoncio commented 3 years ago

Hi @Sinan-Yavuz, thank you for your question!

If you run questionnaire_gen() with family="gaussian", lsasim will generate multinomial variables. You can also add full_output=TRUE to get the variance-covariance matrix:

bgr <- questionnaire_gen(
    n_obs = N, cat_prop = c_prop, cor_matrix = d, n_vars = 6, theta = TRUE,
    family = "gaussian", full_output = TRUE  # new additions to your code
)
bgr_dt <- bgr$bg
covariances <- bgr$linear_regression$vcov_YXW

I'm trying to think if it is possible for the user to explicitly pass this YXW variance matrix, but I don't think it is: they can only pass the YXZ matrix, and then W is calculated as a multinomial equivalent of the Normally-distributed Zs.

In summary, the current version of lsasim generates correlated multinomial variables, but I reckon the user doesn't have explicit control over those values.

wleoncio commented 3 years ago

Hi @Sinan-Yavuz, I was wondering if the message above solves the issue.

P.S.: we're currently working on #40, hopefully your fix will be integrated soon.