yrosseel / lavaan

an R package for structural equation modeling and more
http://lavaan.org
412 stars 99 forks source link

SEs not included for defined parameters when B > 1 bootstrap samples fail #273

Closed philchalmers closed 1 year ago

philchalmers commented 1 year ago

Hi Yves,

I recently noticed that sem(..., se='bootstrap') behaves differently on recent version of lavaan, where for defined parameters it seems that the SE information is no longer reported when one or more bootstrap samples fail to converge. Below is a reprex of the issue, which I assume is unintended. HTH for now.

devtools::install_github('yrosseel/lavaan')

library(lavaan)
set.seed(1)
n <- 40
X <- sample(0:1, n, replace=TRUE)
M <- round(100 + 10*X + rnorm(n, sd=10))
Y <- round(1.7*M + rnorm(n, sd=5))
Data <- data.frame(X = X, Y = Y, M = M)
model <- ' # direct effect
             Y ~ c*X
           # mediator
             M ~ a*X
             Y ~ b*M
           # indirect effect (a*b)
             ab := a*b
           # total effect
             total := c + (a*b)
         '

# latest version does not provide SEs for defined parameters
fit <- sem(model,
           data = Data,
           se = "bootstrap",
           bootstrap = 5000)
summary(fit)

> lavaan 0.6.16 ended normally after 1 iteration
> 
> Estimator                                         ML
> Optimization method                           NLMINB
> Number of model parameters                         5
> 
> Number of observations                            40
> 
> Model Test User Model:
>     
>     Test statistic                                 0.000
> Degrees of freedom                                 0
> 
> Parameter Estimates:
>     
>     Standard errors                            Bootstrap
> Number of requested bootstrap draws             5000
> Number of successful bootstrap draws            4962
> 
> Regressions:
>     Estimate  Std.Err  z-value  P(>|z|)
> Y ~                                                 
>     X          (c)    0.750    1.199    0.626    0.531
> M ~                                                 
>     X          (a)   10.394    0.609   17.072    0.000
> Y ~                                                 
>     M          (b)    1.612    0.100   16.104    0.000
> 
> Variances:
>     Estimate  Std.Err  z-value  P(>|z|)
> .Y                22.280    4.527    4.922    0.000
> .M                75.995   18.097    4.199    0.000
> 
> Defined Parameters:
>     Estimate  Std.Err  z-value  P(>|z|)
> ab               16.755       NA                  
> total            17.505       NA        
sfcheung commented 1 year ago

I modified the example to illustrate the same problem bootstrap = 200:

library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
set.seed(1)
n <- 40
X <- sample(0:1, n, replace=TRUE)
M <- round(100 + 100*X + rnorm(n, sd=10))
Y <- round(1.7*M + rnorm(n, sd=5))
Data <- data.frame(X = X, Y = Y, M = M)
model <- ' # direct effect
             Y ~ c*X
           # mediator
             M ~ a*X
             Y ~ b*M
           # indirect effect (a*b)
             ab := a*b
           # total effect
             total := c + (a*b)
         '
fit <- sem(model,
           data = Data,
           se = "bootstrap",
           bootstrap = 200)
#> Warning in lav_model_nvcov_bootstrap(lavmodel = lavmodel, lavsamplestats =
#> lavsamplestats, : lavaan WARNING: 5 bootstrap runs failed or did not converge.
summary(fit, ci = TRUE)
#> lavaan 0.6.15 ended normally after 29 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                         5
#> 
#>   Number of observations                            40
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                 0.000
#>   Degrees of freedom                                 0
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                            Bootstrap
#>   Number of requested bootstrap draws              200
#>   Number of successful bootstrap draws             195
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
#>   Y ~                                                                   
#>     X          (c)   -9.631    9.509   -1.013    0.311  -30.283    9.347
#>   M ~                                                                   
#>     X          (a)  100.583    2.427   41.447    0.000   95.487  105.628
#>   Y ~                                                                   
#>     M          (b)    1.792    0.093   19.200    0.000    1.611    1.993
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
#>    .Y                22.187    4.551    4.875    0.000   12.213   29.693
#>    .M                68.746   14.366    4.785    0.000   39.102   94.454
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
#>     ab              180.235       NA                    161.240  199.392
#>     total           170.604       NA                    161.644  179.907

Maybe the problem occurred here?

https://github.com/yrosseel/lavaan/blob/aaef68e72f37b9abfc55ef47415060d33b581ee2/R/lav_model_vcov.R#L652

lav_model_vcov_se() is used by lavaan() to compute the SEs.

https://github.com/yrosseel/lavaan/blob/aaef68e72f37b9abfc55ef47415060d33b581ee2/R/lav_model_vcov.R#L615-L616

https://github.com/yrosseel/lavaan/blob/aaef68e72f37b9abfc55ef47415060d33b581ee2/R/xxx_lavaan.R#L1727-L1730

However, if some bootstrap samples are not successful and so bootstrap estimates are NAs in some rows of BOOT, they are not removed. BOOT.def will have NAs or NaNs, and then some elements in cov(BOOT.def) will be NAs, hence NAs in some SEs.

https://github.com/yrosseel/lavaan/blob/aaef68e72f37b9abfc55ef47415060d33b581ee2/R/lav_model_vcov.R#L652-L658

Maybe we can fix this as in the following branch?

https://github.com/sfcheung/lavaan/blob/ec6a5bc7b5b194cc5e06e9c41d74d299fb9c9a0d/R/lav_model_vcov.R#L651-L662

I didn't write the lines. I just copied them from parameterEstimates():

https://github.com/yrosseel/lavaan/blob/aaef68e72f37b9abfc55ef47415060d33b581ee2/R/lav_object_methods.R#L464-L467

An even simpler solution may be this one:

def.cov <- cov(BOOT.def, use = "complete.obs")

I didn't send a pull request because lav_model_vcov_se() is also called by a few other functions which I am not familiar with. I am not sure whether the fix will affect them.

sfcheung commented 1 year ago

Using that branch (https://github.com/sfcheung/lavaan/tree/se_na_boot), bootstrap SEs for user-defined parameters will be reported ("D:/GitHub/lavaan" needs to be the location of the local clone)

devtools::load_all("D:/GitHub/lavaan")
#> ℹ Loading lavaan
#> This is lavaan 0.6-16
#> lavaan is FREE software! Please report any bugs.
set.seed(1)
n <- 40
X <- sample(0:1, n, replace=TRUE)
M <- round(100 + 100*X + rnorm(n, sd=10))
Y <- round(1.7*M + rnorm(n, sd=5))
Data <- data.frame(X = X, Y = Y, M = M)
model <- ' # direct effect
             Y ~ c*X
           # mediator
             M ~ a*X
             Y ~ b*M
           # indirect effect (a*b)
             ab := a*b
           # total effect
             total := c + (a*b)
         '
fit <- sem(model,
           data = Data,
           se = "bootstrap",
           bootstrap = 200)
#> Warning in lav_model_nvcov_bootstrap(lavmodel = lavmodel, lavsamplestats =
#> lavsamplestats, : lavaan WARNING: 5 bootstrap runs failed or did not converge.
summary(fit, ci = TRUE)
#> lavaan 0.6.16 ended normally after 29 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                         5
#> 
#>   Number of observations                            40
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                 0.000
#>   Degrees of freedom                                 0
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                            Bootstrap
#>   Number of requested bootstrap draws              200
#>   Number of successful bootstrap draws             195
#> 
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
#>   Y ~                                                                   
#>     X          (c)   -9.631    9.509   -1.013    0.311  -30.283    9.347
#>   M ~                                                                   
#>     X          (a)  100.583    2.427   41.447    0.000   95.487  105.628
#>   Y ~                                                                   
#>     M          (b)    1.792    0.093   19.200    0.000    1.611    1.993
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
#>    .Y                22.187    4.551    4.875    0.000   12.213   29.693
#>    .M                68.746   14.366    4.785    0.000   39.102   94.454
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|) ci.lower ci.upper
#>     ab              180.235    9.212   19.566    0.000  161.240  199.392
#>     total           170.604    4.497   37.941    0.000  161.644  179.907
yrosseel commented 1 year ago

Fixed in 0.6-16.1849!