Closed jtextor closed 1 day ago
Also, the skewness and kurtosis arguments in simulateData produce empirical skewness and kurtosis that do not always match the simulated values. For example, often empirical skewness and kurtosis values are much lower than the expected (i.e. simulated) values.
@ylenio that property may be related to this pull request: https://github.com/yrosseel/lavaan/pull/79. But in general this data generation method isn't perfect.
5 years later, this is still not fixed. In that period, two people told me they would take this up, but nothing happened so far. It would seem I have to fix this myself.
I thought the simstandard package addressed this
https://cran.r-project.org/web/packages/simstandard/vignettes/simstandard_tutorial.html
Indeed. So maybe lavaan should just remove the standardized= argument in simulateData() (for now).
I needed to make changes to the (very old) setResidualElements.LISREL() function in order to make the correlation = TRUE option more robust.
It turns out the same 'trick' was needed to fix the standardized = TRUE option in simulateData(). The problem is to find those values for the residual variances in the PSI matrix, so that IB.inv %*% PSI %*% t(IB.inv)
is a correlation matrix (at least for the y variables). It would seem there are two settings:
1) The Beta matrix is acyclic (ie. no cycles); then, for each y variable, we need to find its ancestors, and then we can compute the total variance of the RHS of the regression (without the error term), and then we can find the residual exactly; as almost all models in SEM are acyclic, it is nice to have an exact method
2) The beta matrix is not acyclic; I don't think an analytic solution is possible (but I may be wrong); I then cast this as a (nonlinear) optimization problem, and let nlminb() do the hard work.
Ironically, this is fixed just as I was about the replace the simulateData() function, and replace it by another (and more powerful) version. Oh well.
To try it out with the example that failed:
model <- '
y ~ (-.5) * x + (.25) * z
x ~ .5*U
z ~ .5*U
'
Data <- simulateData(model, empirical = TRUE, return.fit = TRUE,
standardized = TRUE)
fit <- attr(Data, "fit")
lavInspect(fit, "implied")$cov
results in
y x z U
y 1.000
x -0.438 1.000
z 0.125 0.250 1.000
U -0.125 0.500 0.500 1.000
In the simstandard tutorial, the following example is given:
model <- '
Y ~ -.75 * X_1 + .25 * X_2
X =~ .75 * X_1 + .75 * X_2
'
Data <- simulateData(model, empirical = TRUE, return.fit = TRUE,
standardized = TRUE)
fit <- attr(Data, "fit")
lavInspect(fit, "implied")$cov
which results in
X_1 X_2 Y
X_1 1.000
X_2 0.562 1.000
Y -0.609 -0.172 1.000
If you can find any cases that seem to fail, let me know. Finally closing this very old issue.
The function "simulateData" is currently not able to provide standardized output in the general case, as the following example demonstrates:
The variance of y far exceeds 1.