lavaan_reg() defaults - Githubissues

TDJorgensen commented 1 year ago

For review: openjournals/joss-reviews#5701

lavaan_reg() only allows for reporting unstandardized or standardized coefficients, the latter being reported with NHSTs based on delta-method SEs. This does not conform to the recommended practice of reporting unstandardized coefficients with their NHSTs (SEs, CIs, Wald z tests; McDonald & Ho, 2002, p. 76).

APA recommends reporting a standardized measure of effect size with any NHST, and the standardized coefficient satisfies this without any need to report a redundant NHST for the standardized coefficient (which typically tests the same null hypothesis using different point and SE estimates). A more appropriate default table would reporting the parameterEstimates() output with the std.all column only used as an effect size (which is why it is reported that way in lavaan's summary() method).

However, Kelley & Preacher (2012) note that standardized estimates might also estimate a specific parameter of interest. They have sampling distributions, so reporting CIs for standardized coefficients is a good idea for the same reason it is good to provide them for unstandardized estimates.

One idea might be to have a test= argument in addition to estimate=, to indicate which NHST (and CI) to include. The default might be lavaan_reg(..., estimate=c("b","B"), test = "b") to request both estimates but only test the unstandardized. But it could get complicated figuring out ideal layouts for the 9 different combinations of either/or/both for estimate= and test=.

At which point the question becomes whether to keep the function simple (expecting it to satisfy all situations) or make it more flexible with options/arguments that give users enough control to publish according to their preferences rather than a package designer's.

rempsyc commented 1 year ago

For effect sizes in lm models, I like to report the semipartial correlation squared (delta R2), but for this I rely on effectsize::r2_semipartial(), which AFAIK does not work with lavaan models. In this case, I suppose you are correct that the standardized coefficient will need to suffice as the measure of effect size.

For any effect size, it is generally recommended to include the CI (consistent with Kelley & Preacher, 2012), so here I could report a single table with all required values, including the unstandardized estimate with its CI and p value, AND the effect size (standardized estimate), also with its CI, but with no additional p value.

I'm not sure about the test argument to report separate p values with the 9 combinations... I wonder how niche this need is (and lavaanExtra isn't really supposed to be), so I wonder for those special needs whether it would be better to let people extract the info themselves with standardizedsolution()? I would tend to like to make the function both as simple and flexible as possible, but if you think it is important to accomodate this complex scenario, I will try to find a way to do it.

On another note, if I am not mistaken, this recommandation also applies to the new lavaan_defined() (formerly, lavaan_ind()), correct? Would this also apply to lavaan_cov() and lavaan_var() then?

rempsyc commented 1 year ago

So for now I have removed the estimate argument, so it is only possible to get one type of table. I also had to update rempsyc::nice_table() to support two confidence intervals in the same table... but it works :) reprex:

library(lavaan)
library(lavaanExtra)
x <- paste0("x", 1:9)
y <- c("visual", "textual", "speed")
latent <- list(visual = x[1:3], textual = x[4:6], speed = x[7:9])
regression <- list(ageyr = y, grade = y)
HS.model <- write_lavaan(regression = regression, latent = latent, label = TRUE)
fit <- sem(HS.model, data = HolzingerSwineford1939)

lavaan_reg(fit, nice_table = TRUE)

^{Created on 2023-10-08 with reprex v2.0.2}

(tables appear ugly in reprexes but anyway)

With this, I hope we can move on with the paper but I am very interested to continue this discussion to bring the package up to the best reporting standards. An enormous thanks for all the time and expert advice you have put into improving this package. It is priceless.

rempsyc commented 1 year ago

I have corrected it as well for lavaan_defined():

library(lavaan)
library(lavaanExtra)
x <- paste0("x", 1:9)
latent <- list(visual = x[1:3], textual = x[4:6], speed = x[7:9])
mediation <- list(speed = "visual", textual = "visual", visual = c("ageyr", "grade"))
indirect <- list(IV = c("ageyr", "grade"), M = "visual", DV = c("speed", "textual"))
HS.model <- write_lavaan(mediation, indirect = indirect, latent = latent, label = TRUE)
fit <- sem(HS.model, data = HolzingerSwineford1939)

lavaan_defined(fit, nice_table = TRUE)

^{Created on 2023-10-08 with reprex v2.0.2}

rempsyc commented 1 year ago

Both of these also now rely on a new internal function, lavaan_extract(), which I am hoping to integrate with lavaan_var() and lavaan_cov() as well.

TDJorgensen commented 1 year ago

this recommandation also applies to the new lavaan_defined() (formerly, lavaan_ind()), correct?

Yes, nice work. I think these tables include all the right information, and no need to accommodate complicated hypothetical combinations that I speculated about.

Would this also apply to lavaan_cov() and lavaan_var() then?

No, I don't think people typically have a priori hypotheses they want to test about most of those parameters anyway. Except for factor correlations, whose metric is arbitrary anyway.

For effect sizes in lm models, I like to report the semipartial correlation squared (delta R2), but for this I rely on effectsize::r2_semipartial(), which AFAIK does not work with lavaan models

I doubt it, but this paper could be helpful if you want to put it on your long-term feature-request list:

https://doi.org/10.3758/s13428-020-01532-y

rempsyc commented 1 year ago

Well, I tried integrating lavaan_cov() with lavaan_extract(), so we get a consistent structure always with both standardized and unstandardized estimates, and simplified code internally. So again here it is not possible to specify estimate = r or estimate = sigma since you will always get a single table with both.

library(lavaan)
library(lavaanExtra)
x <- paste0("x", 1:9)
latent <- list(visual = x[1:3], textual = x[4:6], speed = x[7:9])
covariance <- list(speed = "textual", ageyr = "grade")
HS.model <- write_lavaan(covariance = covariance, latent = latent, label = TRUE)
fit <- sem(HS.model, data = HolzingerSwineford1939)

lavaan_cov(fit, nice_table = TRUE)

^{Created on 2023-10-09 with reprex v2.0.2}

The only issue is that now I only report p values for the sigma, so I wonder if people may misreport the correlations with the sigma p values in this case (though it's a bit unclear to me how this scenario is different from looking at the standardized coefficient, r, as simply the standardized effect size as with the other extracting functions).

rempsyc / lavaanExtra

lavaan_reg() defaults #28