yrosseel / lavaan

an R package for structural equation modeling and more
http://lavaan.org
424 stars 98 forks source link

Surprising interaction of labels and group.equal #184

Closed aaronpeikert closed 4 years ago

aaronpeikert commented 4 years ago

When using a model where every parameter is explicit and labelled I get surprising equality constrains when I set group.equal = "loadings". In fact, all parameters are constrained because they get the same label in all groups. While that is not the behaviour I would expect from the documentation I think I understand what is going on.

However, when group.equal = c("intercepts", "loadings") I lose one DF?

If you do not think this is a bug, feel free to close this issue right away.

library(lavaan)
#> This is lavaan 0.6-6
#> lavaan is BETA software! Please report any bugs.

HS.model <- "
x =~ 1 * x1 + l2x * x2 + l3x * x3
x ~~ lv1x * x
x1 ~ i1x * 1
x2 ~ i2x * 1
x3 ~ i3x * 1
x1 ~~ v1x * x1
x2 ~~ v2x * x2
x3 ~~ v3x * x3
"

configural <- cfa(HS.model, 
           data = HolzingerSwineford1939, 
           group = "school")

metric <- cfa(HS.model, 
              data = HolzingerSwineford1939, 
              group = "school",
              group.equal = "loadings")

scalar <- cfa(HS.model, 
              data = HolzingerSwineford1939, 
              group = "school",
              group.equal = c("intercepts", "loadings"))

strict <- cfa(HS.model, 
              data = HolzingerSwineford1939, 
              group = "school",
              group.equal = c("intercepts", "loadings", "residuals"))

anova(configural, metric)
#> Chi-Squared Difference Test
#> 
#>            Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)    
#> configural  0 2718.0 2784.8  0.000                                  
#> metric      9 2731.9 2765.3 31.918     31.918       9  0.0002057 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(metric, scalar)
#> Chi-Squared Difference Test
#> 
#>        Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)  
#> scalar  8 2728.3 2765.4 26.292                                
#> metric  9 2731.9 2765.3 31.918     5.6259       1     0.0177 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(strict, scalar)
#> Warning in lavTestLRT(object = new("lavaan", version = "0.6.6", call =
#> lavaan::lavaan(model = HS.model, : lavaan WARNING: some models have the same
#> degrees of freedom
#> Chi-Squared Difference Test
#> 
#>        Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
#> strict  8 2728.3 2765.4 26.292                              
#> scalar  8 2728.3 2765.4 26.292          0       0

Created on 2020-06-22 by the reprex package (v0.3.0)

TDJorgensen commented 4 years ago

When using a model where every parameter is explicit and labelled

That is not what your syntax does. You only provide one label per parameter, rather than one label per group, such as:

x =~ 1*x1 + c(l2x.g1, l2x.g2)*x2 + c(l3x.g1, l3x.g2)*x3

that is not the behaviour I would expect from the documentation

The example in your link does what I showed above. See also the Multiple groups section of the ?model.syntax help page:

In a multiple group analysis, modifiers that contain a single constant must be replaced by a vector, having the same length as the number of groups. The only exception are numerical constants (for fixing values): if you provide only a single number, the same number will be used for all groups. However, it is safer (and cleaner) to specify the same number of elements as the number of groups.

You noticed the same behavior applies to labels.

However, when group.equal = c("intercepts", "loadings") I lose one DF?

When lavaan sees "intercepts" %in% group.equal, it recognizes that latent means need only be fixed in one group for identification, not in all groups. If you compare your summary() output between metric and scalar models, you would see that group-2's latent mean is fixed in the metric model but freely estimated in the scalar model.

The group.equal= shortcut is provided so that labels are not necessary to add equality constraints. Since you are using labels, I wouldn't bother with the group.equal= shortcut, but you need to make sure you free unnecessary identification constraints manually in the syntax. Using labels provides more control than the group.equal= shortcut, so I usually recommend that. You might find the semTools::measEq.syntax() function informative and useful.

If you do not think this is a bug, feel free to close this issue right away

It is not a bug, but I agree the documentation should mention that a single label in multigroup models will be recycled, perhaps inadvertently adding unintended equality constraints.

aaronpeikert commented 4 years ago

Thank you for getting back to me so quickly and thoroughly. Now I better understand the behaviour, but I still find it surprising that labels are only recycled across groups if group.equal is set.

aaronpeikert commented 4 years ago

Anyhow, slightly tweaking the documentation is probably all you can do.

TDJorgensen commented 4 years ago

I still find it surprising that labels are only recycled across groups if group.equal is set.

Ah yes, I missed that detail, but obviously that accounts for why the configural model still had df = 0. I doubt that is intentional; perhaps the source code sets something TRUE to use any user-supplied labels whenever group.equal is not an empty string. I hope the source code could be either be adjusted or issue a warning to use a vector of labels whenever ngroups > 1L.

yrosseel commented 4 years ago

See lav_partable.R, lines 354-372.

I agree that this is not very consistent, and perhaps we should change it. (Of course, using labels in combination with group.equal= is not a good idea to begin with, but well). This is the current behavior (0.6-6) if a single modifier is provided in a multiple group setting: 1) ALL modifiers (fixed values, starting values, lower bounds, upper bounds, ...) are silently recycled 2) except labels 3) BUT, if group.equal contains "loadings", labels are recycled too

This is the logic: I would argue that 1) is consistent with general R behavior, and often useful, although a (single) warning is perhaps needed? The exception for labels (in the absence of group.equal = "loadings") is of course to avoid (inadvertently) imposing equality constraints, just by labeling the (first-group) parameters. Here, we 'guess' that imposing equality constraints is not what the user intended (if only a single label is provided). But this creates an exception to general rule 1). Then 3) assumes that the user does want to impose equality constraints after all, and recycles the labels again. (It is a bit unfortunate that the code only checks for "loadings").

Proposal: we drop 2), but with a warning. In other words, we always recycle single modifiers, including labels. Only if group.equal= is empty, we also produce a warning about this.

Thoughts?

aaronpeikert commented 4 years ago

To me, that sounds very reasonable. Thank you for getting back to me.

yrosseel commented 4 years ago

I will check if this change doesn't break any other packages.