Closed TDJorgensen closed 1 year ago
Thank you for these useful observations. I like these ideas and I think it makes the package stronger and more consistent. I have:
lavaan_cor()
file to avoid redundant codeestimate
options (formerly b and B) to r and sigma in lavaan_cov()
, with corresponding unicode support in rempsyc::nice_table()
.lavaan_cor()
as an alias to lavaan_cov()
, but with a forced estimate = "r"
for backward compatibility.lavaan_var()
with only diagonal elements, with estimate = sigma
by default, and the option to specify estimate = r2
(in which case I provide 1 - sigma
, and corresponding corrected confidence intervals). I calculated the new r2 p values like this:r2_pvalue <- function(est, se) {
wald_z <- (1 - est) / se
pnorm(wald_z)
}
So we get:
library(lavaan)
library(lavaanExtra)
x <- paste0("x", 1:9)
latent <- list(visual = x[1:3], textual = x[4:6], speed = x[7:9])
regression <- list(ageyr = c("visual", "textual", "speed"), grade = c("visual", "textual", "speed"))
covariance <- list(speed = "textual", ageyr = "grade")
HS.model <- write_lavaan(regression = regression, covariance = covariance, latent = latent, label = TRUE)
fit <- sem(HS.model, data = HolzingerSwineford1939)
lavaan_cov(fit, estimate = "sigma", nice_table = TRUE)
lavaan_cov(fit, estimate = "r", nice_table = TRUE)
lavaan_var(fit, estimate = "sigma", nice_table = TRUE)
lavaan_var(fit, estimate = "r2", nice_table = TRUE)
Created on 2023-10-08 with reprex v2.0.2
The fact that in this example I only get p values of 1 is a bit suspicious, but it seems consistent with the previous p values, which were all virtually 0, given that we have swapped the null hypothesis.
Finally,
when estimate = "sigma" (with a \sigma^2 column header when nice_table=TRUE)
Wait, if it represents sigma squared, then should the estimation not be named sigma2
instead of simply sigma
?
in this example I only get p values of 1
You are calculating the wrong wald_z
statistic, as well as the wrong p value.
wald_z <- (1 - est) / se
At this point in the syntax, est
is already R^2, so there is no reason to subtract it from 1 again to make it the proportion of unexplained variance. If you did, you would have to additionally subtract the null hypothesized value of 1 (i.e., no variance is explained, so 100% is unexplained). That would make it 1 - est - 1
which is simply -est
. And you would have to calculate the p value as the lower.tail=TRUE
of the curve (which is what you are currently doing.
It is simpler to calculate wald_z <- est/se
for the null hypothesis that there is no explained variance, but the p value should be the upper tail of the curve: pnorm(wald_z, lower.tail = FALSE)
.
if it represents sigma squared, then should the estimation not be named sigma2 instead of simply sigma?
The table column should certainly be labeled \sigma^2 because it is a variance estimate, not a SD estimate. Maybe you are right that the argument should be "sigma2"
then, which also matches the "squared" in "r2"
.
For lavaan_var()
it is not necessary to have 2 identical columns of variable names. You can drop the rhs
.
Ah, yes, I see, thanks for the explanations! So correction is:
x$est.std <- abs(1 - x$est.std)
x$pvalue <- stats::pnorm(x$est.std / x$se, lower.tail = FALSE)
And we get:
library(lavaan)
library(lavaanExtra)
x <- paste0("x", 1:9)
latent <- list(visual = x[1:3], textual = x[4:6], speed = x[7:9])
mediation <- list(speed = "visual", textual = "visual", visual = c("ageyr", "grade"))
indirect <- list(IV = c("ageyr", "grade"), M = "visual", DV = c("speed", "textual"))
HS.model <- write_lavaan(mediation, indirect = indirect, latent = latent, label = TRUE)
fit <- sem(HS.model, data = HolzingerSwineford1939)
lavaan_var(fit, estimate = "r2", nice_table = TRUE)
lavaan_var(fit, estimate = "sigma2", nice_table = TRUE)
Created on 2023-10-09 with reprex v2.0.2
And now things look better :)
For review: https://github.com/openjournals/joss-reviews/issues/5701
The separate
lavaan_cov()
andlavaan_cor()
functions seem inconsistent with a singlelavaan_reg()
function with theestimate=
argument to select (un)standardized coefficients. In fact, the output oflavaan_cor()
is redundant with any covariances printed bylavaan_cov(..., estimate = "B")
. It might be more efficient to make the following lines fromlavaan_cor()
conditional on an argument likediag=FALSE
, so the variances are ignoredif(!diag)
.Of course, the complication is what symbols to use. It makes sense for
lavaan_cor()
to use r for correlations, but that is also what some of the standardized values fromlavaan_cov()
are (yet they are rather unfortunately labeled "b", despite not being regression slopes). If you wanted to segment out the variances, you couldlavaan_cov()
to be only for off-diagonal elements, and it returns whatlavaan_cor()
does whenestimate = "B"
(or perhaps more appropriately for this functionality, whenestimate = "r"
, andestimate = "sigma"
could indicate unstandardized covariances, triggeringrempsyc::nice_table()
to use a \sigma column header).lavaan_var()
), which could be (unstandardized) variances whenestimate = "sigma"
(with a \sigma^2 column header whennice_table=TRUE
), but settingestimate = "r2"
or"rsq"
could be the flag for standardized values (which are the proportion of total variance, so 1 minus R^2). Or even better, you could calculate 1 minus the standardized variances to actually return R^2 (using that as the table header whennice_table=TRUE
).op == "r2"
fromparameterEstimates()
, but they don't have p values. The test for variances fromstandardizedSolution()
is for the null hypothesis that the residual variance = 0, which would imply the null hypothesis that R^2 = 1. That is probably not very useful. Instead, it would be trivial for you to calculate a new p values to test whether the standardized variance = 1, which would imply R^2 = 0 (probably a more sensible default test). That only requires calculating a new Wald z statistic as (est - 1)/SE and usingpnorm()
to get a one-tailed p value (since it can only go in one direction).