Closed b1azk0 closed 4 years ago
The syntax looks basically correct if those were 5 different variables, but your comment says this is moderated mediation. If your model includes a product term, you cannot generate it as a separate variable because it would not be the product of the two lower-order terms. Moderation falls outside the purview of covariance structure analysis (which only models linear effects), so simsem cannot be used to generate your data. You would need to write your own function to:
You could then pass that data-generating function to sim()
because the generate=
argument accepts a function with only a sample-size argument. You can find some guidance online, for example:
https://stats.stackexchange.com/questions/59062/multiple-linear-regression-simulation
Good luck, Terrence
Oh my... That sounds complicated to say the least and far from my comfort zone.
If I may, I'd like to learn something, despite the fact that maybe I won't be able to compute the sample size in question.
This is how the hypothesized model look like (as in Hayes PROCES templates):
From what covariance matrix I should start (keeping in mind I have some real data I want to with) ? For the outcome:
Product terms key:
Int_1 : GNT x CN
Covariance matrix of regression parameter estimates:
constant Cyberbal GNT CN Int_1
constant 0.0037 -0.0047 0.0015 0.0002 -0.0001
Cyberbal -0.0047 0.0089 -0.0027 -0.0003 0.0002
GNT 0.0015 -0.0027 0.0019 0.0001 0.0000
CN 0.0002 -0.0003 0.0001 0.0005 0.0000
Int_1 -0.0001 0.0002 0.0000 0.0000 0.0004
Fot the mediator
Covariance matrix of regression parameter estimates:
constant Cyberbal
constant 0.0041 -0.0041
Cyberbal -0.0041 0.0077
Well, X and V are the exogenous predictors, so start by generating those from their covariance matrix (e.g., using the mvrnorm()
function in the MASS
package, which returns a matrix
that you can convert to a data.frame
). Then you add the product term to the data set (e.g., dat$xv <- dat$x * dat$v
). Then you can generate the endogenous M and Y variables using their regression model parameters (and generating random errors), as shown in the page I linked to above.
Hi Terrence, thank for the hint.
Unfortunately I got stuck at the beginning - what would be the sample size for mvrnom()
to begin with?
Would semTools::indProd()
work here to make the generation process easier?
You choose the sample size.
indProd()
is for multiple indicators of a latent interaction between common factors. You only have observed variables.
I know that I should choose the sample size, but if I'm after estimating the sample size, this should be somehow connected to sim(n=100:500)
right?
Yes, as stated above, you would need to write a function that has sample size as its argument, then pass that to sim(..., generate=)
. So you would still be able to use a variable n
for estimating power and choosing N.
Terrence, thanks for your help. I wasn't able to move forward and got stuck at the beginning.
I have the 1st part sorted (I think)
> n=100
>
> (Sigma<-df %>% select(M, V) %>% var(na.rm = T))
M V
M 1.45604513 0.09586241
V 0.09586241 2.10041305
>
> dat <- mvrnorm(n, rep(0, 2), Sigma) %>% as.data.frame() %>%
+ mutate(MxV = M*V)
>
I select M
and V
from pilot study database and store they covariance matrix as Sigma
.
Then I compute their product and store the 3 variables in a data.frame
.
So the next step would be generating X
and Y
? But how do I account for what is already in the data frame given orginal model parameters?
Good evening Terrence. I was doing my best and eventually couldn't solve it. Is there any chance I can help me with this or should I close this as off topic? Thanks
Well, X and V are the exogenous predictors, so start by generating those from their covariance matrix (e.g., using the
mvrnorm()
function in theMASS
package, which returns amatrix
that you can convert to adata.frame
). Then you add the product term to the data set (e.g.,dat$xv <- dat$x * dat$v
). Then you can generate the endogenous M and Y variables using their regression model parameters (and generating random errors), as shown in the page I linked to above.
I think I know where my problem is. You wrote X and V are the exogenous predictors. BUT: For the moderator part I need to have a product of M and V (not X and V). In terms of PROCESS templates - I think you wrote about model 7, while I'm thinking about model 14.
Does this make any sense?
I already generated M and V from their covariance matrix: https://github.com/simsem/simsem/issues/57#issuecomment-466420873 using they (co)variance calculated from real dataset. Adding the product of them is easy and I did it how you wrote.
At this point I have a data frame with M
, V
and MxV
and that's where I stopped.
But M is not exogenous. It sounds like you need to simulate X and V as exogenous variables, then simulate M as an endogenous variable predicted by X (and whose residuals are correlated with V). So use MASS::mvrnorm()
to simulate a correlated set of 3 variables: X, V, and the residuals of M.
Ahh... This makes perfect sense! I will report back, once the data generating function is working.
Thanks
I was trying to run a monte carlo simulation for estimating a sample size for a hypothesized moderated mediation (https://github.com/simsem/simsem/issues/56).
The goal here is sample size number for a study where:
cb
=X
is experimental condition coded as -0.5 for one group and 0.5 for the other groupgnt
=M
continuous mediator mean centeredcn
=V
continuous moderator mean centeredcngn
=MxV
product ofcn
*gnt
INT
=Y
uncenteredBased on previous studies (SPSS PROCESS macro outputs) I took parameters for my population model:
Then I've setup analysis model testing for indirect effects and conditional indirect effects the following way:
If I assume that above models are written properly (are they?) the simulation code I wrote is this:
Besides not being sure wheter or not I coded both population and analysis model the right way, I would also like to have the possibility to randomize population model parameters (eg. drawn from a uniform distribution of effects I observed in earlier studies). As I belive this is not possible via
lavaan
model syntax but I can't work my way to translate it intosimsem
model syntax.Could you please help me out?