rempsyc / lavaanExtra

Convenience Functions for Package `lavaan`
https://lavaanExtra.remi-theriault.com/
Other
18 stars 4 forks source link

diagonal elements in lavaan_cov() and lavaan_cor() #18

Closed TDJorgensen closed 1 year ago

TDJorgensen commented 1 year ago

For review: https://github.com/openjournals/joss-reviews/issues/5701

The separate lavaan_cov() and lavaan_cor() functions seem inconsistent with a single lavaan_reg() function with the estimate= argument to select (un)standardized coefficients. In fact, the output of lavaan_cor() is redundant with any covariances printed by lavaan_cov(..., estimate = "B"). It might be more efficient to make the following lines from lavaan_cor() conditional on an argument like diag=FALSE, so the variances are ignored if(!diag).

not.cor <- which(x$lhs == x$rhs) # Why call this "not.cor" when it means the opposite?
x <- x[-not.cor, ]               # This looks like "not not.cor" when you mean "not cor"

Of course, the complication is what symbols to use. It makes sense for lavaan_cor() to use r for correlations, but that is also what some of the standardized values from lavaan_cov() are (yet they are rather unfortunately labeled "b", despite not being regression slopes). If you wanted to segment out the variances, you could

rempsyc commented 1 year ago

Thank you for these useful observations. I like these ideas and I think it makes the package stronger and more consistent. I have:

  1. Deleted the lavaan_cor() file to avoid redundant code
  2. Changed the estimate options (formerly b and B) to r and sigma in lavaan_cov(), with corresponding unicode support in rempsyc::nice_table().
  3. Kept only off-diagonal elements
  4. Added in the same file lavaan_cor() as an alias to lavaan_cov(), but with a forced estimate = "r" for backward compatibility.
  5. Added function lavaan_var() with only diagonal elements, with estimate = sigma by default, and the option to specify estimate = r2 (in which case I provide 1 - sigma, and corresponding corrected confidence intervals). I calculated the new r2 p values like this:
r2_pvalue <- function(est, se) {
  wald_z <- (1 - est) / se
  pnorm(wald_z)
}

So we get:

library(lavaan)
library(lavaanExtra)
x <- paste0("x", 1:9)
latent <- list(visual = x[1:3], textual = x[4:6], speed = x[7:9])
regression <- list(ageyr = c("visual", "textual", "speed"), grade = c("visual", "textual", "speed"))
covariance <- list(speed = "textual", ageyr = "grade")
HS.model <- write_lavaan(regression = regression, covariance = covariance, latent = latent, label = TRUE)
fit <- sem(HS.model, data = HolzingerSwineford1939)

lavaan_cov(fit, estimate = "sigma", nice_table = TRUE)
lavaan_cov(fit, estimate = "r", nice_table = TRUE)

lavaan_var(fit, estimate = "sigma", nice_table = TRUE)
lavaan_var(fit, estimate = "r2", nice_table = TRUE)

Created on 2023-10-08 with reprex v2.0.2

The fact that in this example I only get p values of 1 is a bit suspicious, but it seems consistent with the previous p values, which were all virtually 0, given that we have swapped the null hypothesis.

Finally,

when estimate = "sigma" (with a \sigma^2 column header when nice_table=TRUE)

Wait, if it represents sigma squared, then should the estimation not be named sigma2 instead of simply sigma?

TDJorgensen commented 1 year ago

in this example I only get p values of 1

You are calculating the wrong wald_z statistic, as well as the wrong p value.

wald_z <- (1 - est) / se

At this point in the syntax, est is already R^2, so there is no reason to subtract it from 1 again to make it the proportion of unexplained variance. If you did, you would have to additionally subtract the null hypothesized value of 1 (i.e., no variance is explained, so 100% is unexplained). That would make it 1 - est - 1 which is simply -est. And you would have to calculate the p value as the lower.tail=TRUE of the curve (which is what you are currently doing.

It is simpler to calculate wald_z <- est/se for the null hypothesis that there is no explained variance, but the p value should be the upper tail of the curve: pnorm(wald_z, lower.tail = FALSE).

if it represents sigma squared, then should the estimation not be named sigma2 instead of simply sigma?

The table column should certainly be labeled \sigma^2 because it is a variance estimate, not a SD estimate. Maybe you are right that the argument should be "sigma2" then, which also matches the "squared" in "r2".

For lavaan_var() it is not necessary to have 2 identical columns of variable names. You can drop the rhs.

rempsyc commented 1 year ago

Ah, yes, I see, thanks for the explanations! So correction is:

x$est.std <- abs(1 - x$est.std)
x$pvalue <- stats::pnorm(x$est.std / x$se, lower.tail = FALSE)

And we get:

library(lavaan)
library(lavaanExtra)
x <- paste0("x", 1:9)
latent <- list(visual = x[1:3], textual = x[4:6], speed = x[7:9])
mediation <- list(speed = "visual", textual = "visual", visual = c("ageyr", "grade"))
indirect <- list(IV = c("ageyr", "grade"), M = "visual", DV = c("speed", "textual"))
HS.model <- write_lavaan(mediation, indirect = indirect, latent = latent, label = TRUE)
fit <- sem(HS.model, data = HolzingerSwineford1939)

lavaan_var(fit, estimate = "r2", nice_table = TRUE)
lavaan_var(fit, estimate = "sigma2", nice_table = TRUE)

Created on 2023-10-09 with reprex v2.0.2

And now things look better :)