Unidentified model with valid chi-square after constraining the covariance to zero

marklhc commented 8 months ago

This seems to be an interesting case: A two-factor model with five indicators, which is identified when the covariance between factors are not zero:

library(lavaan)
#> This is lavaan 0.6-17
#> lavaan is FREE software! Please report any bugs.
library(semlrtp)
# Use a smaller sample to show the impact of using LRT p-values
set.seed(1234)
data(data_sem16)
mod <-
  "
f1 =~ x1 + x2 + x3
f2 =~ x5 + x6
"
fit <- cfa(mod, data = data_sem16)
lrtp(fit)
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|)     LRTp ci.lower
#>   f1 ~~                                                                 
#>     f2                0.070    0.046    1.519    0.129    0.007   -0.020
#>  ci.upper
#>          
#>     0.160

Note, however, that the LRT p value does not agree with the LB CI. When looking at the constrained model, the model is not identified, because f2 only has two indicators, and the model does not allow shared variances with indicators for f1. lavaan still computes the model chi-square:

mod0 <- 
  "
f1 =~ x1 + x2 + x3
f2 =~ x5 + x6
f1 ~~ 0 * f2
"
fit0 <- cfa(mod0, data = data_sem16)
#> Warning in lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, : lavaan WARNING:
#>     Could not compute standard errors! The information matrix could
#>     not be inverted. This may be a symptom that the model is not
#>     identified.
summary(fit0)
#> lavaan 0.6.17 ended normally after 26 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        10
#> 
#>   Number of observations                           336
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                 8.479
#>   Degrees of freedom                                 5
#>   P-value (Chi-square)                           0.132
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Latent Variables:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   f1 =~                                               
#>     x1                1.000                           
#>     x2                0.911       NA                  
#>     x3                1.051       NA                  
#>   f2 =~                                               
#>     x5                1.000                           
#>     x6                0.572       NA                  
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   f1 ~~                                               
#>     f2                0.000                           
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>    .x1                1.010       NA                  
#>    .x2                1.067       NA                  
#>    .x3                0.700       NA                  
#>    .x5                0.630       NA                  
#>    .x6                1.021       NA                  
#>     f1                0.286       NA                  
#>     f2                0.551       NA

To identify the local model for f2, I constrain the loadings to equal (and 1.0) for both indicators. The model is identified now, and the chi-square is exactly the same, but now we have one more df:

mod0_2 <- 
  "
f1 =~ x1 + x2 + x3
f2 =~ l * x5 + l * x6
f1 ~~ 0 * f2
"
fit0_2 <- cfa(mod0_2, data = data_sem16)
summary(fit0_2)
#> lavaan 0.6.17 ended normally after 28 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                         9
#> 
#>   Number of observations                           336
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                 8.479
#>   Degrees of freedom                                 6
#>   P-value (Chi-square)                           0.205
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Latent Variables:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   f1 =~                                               
#>     x1                1.000                           
#>     x2                0.911    0.253    3.604    0.000
#>     x3                1.051    0.323    3.257    0.001
#>   f2 =~                                               
#>     x5         (l)    1.000                           
#>     x6         (l)    1.000                           
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   f1 ~~                                               
#>     f2                0.000                           
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>    .x1                1.010    0.119    8.457    0.000
#>    .x2                1.067    0.111    9.573    0.000
#>    .x3                0.700    0.114    6.159    0.000
#>    .x5                0.867    0.092    9.455    0.000
#>    .x6                0.886    0.093    9.553    0.000
#>     f1                0.286    0.110    2.595    0.009
#>     f2                0.315    0.067    4.690    0.000

^{Created on 2024-03-07 with reprex v2.1.0}

So I think we may need some more ways to check whether the constrained model is really identified. Here lavaan does indicate that the standard errors cannot be computed due to potential under-identification.

sfcheung commented 8 months ago

Thanks for sharing this case. In 0.0.0.9015, currently in the devel branch, fix_to_zero() will check the VCOV. If lavaan issued a warning, then it assumes that something's wrong, e.g., model not identified, and will raise an error, for now. See #23

In #18, I will revise fix_to_zero() such that, if something's wrong and LRT p-value cannot be computed, it will return NA or a similar value isntead of raising an error. lrtp() will then indicate the problem in the output to alert the users that something's wrong with fixing those paraemters to zero, while the LRT p-values for other parameters are still computed and reported.

sfcheung commented 8 months ago

In #24 , fix_to_zero(), lrt(), lrtp(), and print.lrtp() now can handle error encountered in fixing the parameter to zero or in doing the LR test. If a problem occurs, the error will be recorded, instead of raising an error. Users will be notified of this when the results are printed.

sfcheung commented 5 months ago

This issue can be closed for now because some methods are implemented to try to identify cases with identication issues.

sfcheung / semlrtp

Unidentified model with valid chi-square after constraining the covariance to zero #6