diagonal elements in lavaan_cov() and lavaan_cor()

For review: https://github.com/openjournals/joss-reviews/issues/5701

The separate lavaan_cov() and lavaan_cor() functions seem inconsistent with a single lavaan_reg() function with the estimate= argument to select (un)standardized coefficients. In fact, the output of lavaan_cor() is redundant with any covariances printed by lavaan_cov(..., estimate = "B"). It might be more efficient to make the following lines from lavaan_cor() conditional on an argument like diag=FALSE, so the variances are ignored if(!diag).

not.cor <- which(x$lhs == x$rhs) # Why call this "not.cor" when it means the opposite?
x <- x[-not.cor, ]               # This looks like "not not.cor" when you mean "not cor"

Of course, the complication is what symbols to use. It makes sense for lavaan_cor() to use r for correlations, but that is also what some of the standardized values from lavaan_cov() are (yet they are rather unfortunately labeled "b", despite not being regression slopes). If you wanted to segment out the variances, you could

reserve lavaan_cov() to be only for off-diagonal elements, and it returns what lavaan_cor() does when estimate = "B" (or perhaps more appropriately for this functionality, when estimate = "r", and estimate = "sigma" could indicate unstandardized covariances, triggering rempsyc::nice_table() to use a \sigma column header).
The diagonal elements could be returned by a separate function (e.g., lavaan_var()), which could be (unstandardized) variances when estimate = "sigma" (with a \sigma^2 column header when nice_table=TRUE), but setting estimate = "r2" or "rsq" could be the flag for standardized values (which are the proportion of total variance, so 1 minus R^2). Or even better, you could calculate 1 minus the standardized variances to actually return R^2 (using that as the table header when nice_table=TRUE).
- Granted, you can extract rows with op == "r2" from parameterEstimates(), but they don't have p values. The test for variances from standardizedSolution() is for the null hypothesis that the residual variance = 0, which would imply the null hypothesis that R^2 = 1. That is probably not very useful. Instead, it would be trivial for you to calculate a new p values to test whether the standardized variance = 1, which would imply R^2 = 0 (probably a more sensible default test). That only requires calculating a new Wald z statistic as (est - 1)/SE and using pnorm() to get a one-tailed p value (since it can only go in one direction).

Thank you for these useful observations. I like these ideas and I think it makes the package stronger and more consistent. I have:

Deleted the lavaan_cor() file to avoid redundant code
Changed the estimate options (formerly b and B) to r and sigma in lavaan_cov(), with corresponding unicode support in rempsyc::nice_table().
Kept only off-diagonal elements
Added in the same file lavaan_cor() as an alias to lavaan_cov(), but with a forced estimate = "r" for backward compatibility.
Added function lavaan_var() with only diagonal elements, with estimate = sigma by default, and the option to specify estimate = r2 (in which case I provide 1 - sigma, and corresponding corrected confidence intervals). I calculated the new r2 p values like this:

r2_pvalue <- function(est, se) {
  wald_z <- (1 - est) / se
  pnorm(wald_z)
}

So we get:

library(lavaan)
library(lavaanExtra)
x <- paste0("x", 1:9)
latent <- list(visual = x[1:3], textual = x[4:6], speed = x[7:9])
regression <- list(ageyr = c("visual", "textual", "speed"), grade = c("visual", "textual", "speed"))
covariance <- list(speed = "textual", ageyr = "grade")
HS.model <- write_lavaan(regression = regression, covariance = covariance, latent = latent, label = TRUE)
fit <- sem(HS.model, data = HolzingerSwineford1939)

lavaan_cov(fit, estimate = "sigma", nice_table = TRUE)

lavaan_cov(fit, estimate = "r", nice_table = TRUE)


lavaan_var(fit, estimate = "sigma", nice_table = TRUE)

lavaan_var(fit, estimate = "r2", nice_table = TRUE)

^{Created on 2023-10-08 with reprex v2.0.2}

The fact that in this example I only get p values of 1 is a bit suspicious, but it seems consistent with the previous p values, which were all virtually 0, given that we have swapped the null hypothesis.

Finally,

when estimate = "sigma" (with a \sigma^2 column header when nice_table=TRUE)

Wait, if it represents sigma squared, then should the estimation not be named sigma2 instead of simply sigma?

in this example I only get p values of 1

You are calculating the wrong wald_z statistic, as well as the wrong p value.

wald_z <- (1 - est) / se

At this point in the syntax, est is already R^2, so there is no reason to subtract it from 1 again to make it the proportion of unexplained variance. If you did, you would have to additionally subtract the null hypothesized value of 1 (i.e., no variance is explained, so 100% is unexplained). That would make it 1 - est - 1 which is simply -est. And you would have to calculate the p value as the lower.tail=TRUE of the curve (which is what you are currently doing.

It is simpler to calculate wald_z <- est/se for the null hypothesis that there is no explained variance, but the p value should be the upper tail of the curve: pnorm(wald_z, lower.tail = FALSE).

if it represents sigma squared, then should the estimation not be named sigma2 instead of simply sigma?

The table column should certainly be labeled \sigma^2 because it is a variance estimate, not a SD estimate. Maybe you are right that the argument should be "sigma2" then, which also matches the "squared" in "r2".

For lavaan_var() it is not necessary to have 2 identical columns of variable names. You can drop the rhs.

Ah, yes, I see, thanks for the explanations! So correction is:

x$est.std <- abs(1 - x$est.std)
x$pvalue <- stats::pnorm(x$est.std / x$se, lower.tail = FALSE)

And we get:

library(lavaan)
library(lavaanExtra)
x <- paste0("x", 1:9)
latent <- list(visual = x[1:3], textual = x[4:6], speed = x[7:9])
mediation <- list(speed = "visual", textual = "visual", visual = c("ageyr", "grade"))
indirect <- list(IV = c("ageyr", "grade"), M = "visual", DV = c("speed", "textual"))
HS.model <- write_lavaan(mediation, indirect = indirect, latent = latent, label = TRUE)
fit <- sem(HS.model, data = HolzingerSwineford1939)

lavaan_var(fit, estimate = "r2", nice_table = TRUE)

lavaan_var(fit, estimate = "sigma2", nice_table = TRUE)

^{Created on 2023-10-09 with reprex v2.0.2}

And now things look better :)

rempsyc / lavaanExtra

diagonal elements in lavaan_cov() and lavaan_cor() #18