yrosseel / lavaan

an R package for structural equation modeling and more
http://lavaan.org
412 stars 99 forks source link

Explicitly mention that `summary` sig-tests are based on un-standardized data #119

Closed mattansb closed 5 years ago

mattansb commented 5 years ago

It seems that summary(fit) significance test are based on the un-standardized data, even when standardized = T. Perhaps this should be mentioned explicitly? Or give the option to have them be based on the standardized data?

Simulate data

We (w/ @almogsi) simulated a tri-variate data set, with two uncorrelated variables that are correlated to a third variable. To start, all variables will be centered at 0 (mu) and scaled to 1 (diagonal of Sigma):

library(tidyverse)
Sigma <- matrix(c(1.0, 0.6, 0.6,
                  0.6, 1.0, 0.0,
                  0.6, 0.0, 1.0),
                nrow = 3)
data <- MASS::mvrnorm(n = 1000,
                      mu = rep(0,3),
                      Sigma = Sigma,
                      empirical = T) %>% as.data.frame()

Then re scaled V2 to increase it's slope when predicting V1:

data <- data %>% 
  mutate(V2 = 5*V2+10)

Just to make sure, looked that the correlation matrix (should be the same as Sigma):

knitr::kable(cor(data))
V1 V2 V3
V1 1.0 0.6 0.6
V2 0.6 1.0 0.0
V3 0.6 0.0 1.0

and the covariance matrix (should only be different in the scale of V2):

knitr::kable(cov(data))
V1 V2 V3
V1 1.0 3 0.6
V2 3.0 25 0.0
V3 0.6 0 1.0

Fit lavaan model

library(lavaan)
my_model <- '
V1 ~ a*V2 + b*V3
diff := a - b
'
fit <- sem(my_model, data = data)

If diff is computed on the standardized coefficients, we expect it to be 0.
If diff is computed on the unstandardized coefficients, we expect it to not 0.

summary(fit, standardized = T)

## lavaan 0.6-2 ended normally after 13 iterations
## 
##   Optimization method                           NLMINB
##   Number of free parameters                          3
## 
##   Number of observations                          1000
## 
##   Estimator                                         ML
##   Model Fit Test Statistic                       0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   V1 ~                                                                  
##     V2         (a)    0.120    0.003   35.857    0.000    0.120    0.600
##     V3         (b)    0.600    0.017   35.857    0.000    0.600    0.600
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .V1                0.280    0.013   22.361    0.000    0.280    0.280
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     diff             -0.480    0.017  -28.128    0.000   -0.480    0.000

We can see that the Estimate of diff is non-zero, implying that it was computed on the non-standardized coefficients. BUT we also see that the Std.all of diff is 0, implying that it was computed on the standardized coefficients.
What about the significance test?

parameterEstimates(fit)[7,] %>% knitr::kable()
lhs op rhs label est se z pvalue ci.lower ci.upper
7 diff := a-b diff -0.48 0.0170646 -28.12843 0 -0.513446 -0.446554
standardizedSolution(fit)[7,] %>% knitr::kable()
lhs op rhs est.std se z pvalue ci.lower ci.upper
7 diff := a-b 0 0.0236643 1e-07 0.9999999 -0.0463812 0.0463812

We can see that we get two different z-tests, depending on the type of estimates we get. It seems that the test results returned from summary() are based on the parameterEstimates() function, and thus based on the unstandardized results.

TDJorgensen commented 5 years ago

Yes, summary() and parameterEstimates() return SEs and tests for unstandardized coefficients. The class?lavaan and ?parameterEstimates help pages never claim that standardized=TRUE returns anything except standardized coefficients. Only the ?standardizedSolution says it returns SEs for standardized coefficients. Typically, the standardized solution is used for effect sizes, not NHST, since the model does not estimate standardized parameters (it is just a post-hoc transformation).

mattansb commented 5 years ago

Good point - thanks!

If possible, I think it would be helpful to explicitly add this in the class?lavaan help that

"If standardized=TRUE, the standardized solution is also printed (SEs and tests are based on the unstandardized solution).

Some people (myself included) might find this helpful with the interpretation of the results (if for instance they are using lavaan to conduct a dominance analysis).

Thanks again!

yrosseel commented 5 years ago

Merged in lavaan 0.6-3.1313