simsem / semTools

Useful tools for structural equation modeling
74 stars 35 forks source link

Percentile bootstrap confidence interval #101

Closed sfcheung closed 2 years ago

sfcheung commented 2 years ago

May I propose adding a function to form percentile bootstrap confidence intervals?

This idea arose from a discussion at the lavaan Google Group a few months ago:

https://groups.google.com/g/lavaan/c/qQBXSz5cd0o/m/R8YT5HxNAgAJ

We can use lavaan::bootstrapLavaan() and lavaan::standardizedSolution() to get the bootstrap confidence intervals for the standardized solution. This is what I will do myself. However, some users may want to have a ready-to-use function for this purpose. Moreover, if a user wants to have the bootstrapping confidence intervals for both the original and the standardized solution, the bootstrap resampling needs to be done twice, and these two sets of confidence intervals will be based on two different sets of bootstrap samples.

Assuming that se = "boot" is used when fitting a model, the unstandardized bootstrap estimates are stored in the output. They can be retrieved to form the confidence intervals for the standardized solution. No need to do bootstrapping twice. Moreover, this ensures that both the unstandardized solution and the standardized solution use the same set of bootstrap samples.

The draft function can be found here:

https://github.com/sfcheung/semTools/blob/boot_std/semTools/R/bootstrap_ci_std.R

This function will append the confidence intervals to the output of lavaan::standardizedSolution(), such that users compare the default delta-method confidence intervals and the bootstrap percentile confidence intervals. This function can be used instead of lavaan::standardizedSolution() if se = "boot" is used when fitting a model. It will take some time to run because the standardized solution is computed for each bootstrap sample. However, this is still much faster than using lavaan::bootstrapLavaan().

I tested this function on both single- and multiple-group models, with between-group equality constrains and user-derived parameters. If there are any models for which this function will break, I would be happy to know and check these cases.

It currently supports only percentile confidence intervals, implemented simply by quantile. Not as sophisticated as done by boot::boot.ci() but should be good enough for most scenarios.

If there are any ways to improve the function, or if there are any bugs, please let me know.

If this function is not suitable for semTools, please do not hesitate to let me know. I will then just post it as a Gist for interested users.

sfcheung commented 2 years ago

This is the test I used. Not sure where to place it in the package and so not included in the branch:

# Example adapted from https://lavaan.ugent.be/tutorial/mediation.html
options(width = 132)

library(lavaan)
set.seed(1234)
n <- 50
X <- runif(n) - .5
M <- 0.20*X + rnorm(n)
Y <- 0.17*M + rnorm(n)
Data <- data.frame(X = X, Y = Y, M = M)
model <- ' # direct effect
             Y ~ c*X
           # mediator
             M ~ a*X
             Y ~ b*M
           # indirect effect (a*b)
             ab := a*b
           # total effect
             total := c + (a*b)
         '

system.time(fit <- sem(model, data = Data, fixed.x = FALSE,
                       se = "boot",
                       bootstrap = 2000,
                       parallel ="snow", ncpus = 8))
summary(fit)

standardizedSolution_boot_ci(fit)

# Adapted from https://lavaan.ugent.be/tutorial/groups.html
# Two groups, with equality constraints.

set.seed(5678)
n <- 50
X <- runif(n) - .5
M <- 0.50*X + rnorm(n)
Y <- 0.27*M + rnorm(n)
Data2 <- rbind(cbind(gp = "gp1", Data),
               data.frame(gp = "gp2", X = X, Y = Y, M = M))

model2 <- '# direct effect
             Y ~ c(c1, c2)*X
           # mediator
             M ~ c(a1, a1)*X
             Y ~ c(b1, b1)*M
           # indirect effect (a1*b1)
             ab := a1*b1
           # total effect
             total1 := c1 + (a1*b1)
             total2 := c2 + (a1*b1)
         '

system.time(fit2 <- sem(model2, data = Data2, fixed.x = FALSE,
                       group = "gp",
                       se = "boot",
                       bootstrap = 2000,
                       parallel ="snow", ncpus = 8))
summary(fit2)

standardizedSolution_boot_ci(fit2)
sfcheung commented 2 years ago

I will find an alternative place to host this function. This function may no longer be necessary in the future. Thanks for the advice.