yrosseel / lavaan

an R package for structural equation modeling and more
http://lavaan.org
412 stars 99 forks source link

option `std.ov = TRUE` standardizes only the endogenous variables in a mediation with observed variables #353

Closed juliuspfadt closed 1 month ago

juliuspfadt commented 1 month ago

I found this because someone reported this as a bug for JASP (https://github.com/jasp-stats/jasp-issues/issues/2740).

For instance, running this code

library(lavaan)

dt <- PoliticalDemocracy
model <- "
y3 ~ y1 + y2
y2 ~ y1
"

fit <- sem(model, dt, std.ov = TRUE)
parameterestimates(fit)

dt[, c("y2", "y3")] <- scale(dt[, c("y2", "y3")])
fit2 <- sem(model, dt)
parameterestimates(fit2)

I suppose it would be fixed if either the documentation for lavOptions was changed or the function.

yrosseel commented 1 month ago

Thanks for reporting this. The documentation was incomplete. It is now updated in the github version.

The std.ov option indeed only standardizes the non-exogenous variables per default. The reason is that exogenous variables are often binary (or dummy variables), and we usually do not want to standardize those. This is also related to the fixed.x = TRUE option which is the default. If you set fixed.x = FALSE, all variables are standardized.

To illustrate:

library(lavaan)
fit <- sem('x1 + x2 ~ x3 + x4', data = HolzingerSwineford1939, std.ov = TRUE, 
                  sample.cov.rescale = FALSE, fixed.x = TRUE)
lavInspect(fit, "sampstat")$cov

gives

      x1    x2    x3    x4
x1 1.000                  
x2 0.297 1.000            
x3 0.498 0.384 1.279      
x4 0.434 0.178 0.209 1.355

while

fit <- sem('x1 + x2 ~ x3 + x4', data = HolzingerSwineford1939, std.ov = TRUE, 
                  sample.cov.rescale = FALSE, fixed.x = FALSE)
lavInspect(fit, "sampstat")$cov

gives

      x1    x2    x3    x4
x1 1.000                  
x2 0.297 1.000            
x3 0.441 0.340 1.000      
x4 0.373 0.153 0.159 1.000

But this option is just a brute-force method for rescaling the (nonexogenous) observed variables if there are big scaling differences. It does not perform a correlation analysis. The latter is under development, and can be switched on using the (new) correlation = TRUE option.

juliuspfadt commented 1 month ago

That makes sense. I appreciate your swift reply. Thanks