emmeans for nonexistent variables

Dallak commented 2 years ago

Dear all,

I ran the following code m5 <- emmeans(vot_1, ~ poa * voicing) where poa has three levels: 1- bilabial 2- alveolar 3- velar and voicing has two level: 1- voiced 2- voiceless

Note that the bilabial level is missing the voiceless contrast (it only contains voiced). The expected outcome is something like this one:

 bilabial voiced    
 alveolar voiced    
 velar    voiced   
 alveolar voiceless  
 velar    voiceless

But emeans seems to calculate the mean even for the missing the levels for bilabial (bilabial voiceless).

 poa      voicing   emmean lower.HPD upper.HPD
 bilabial voiced    -0.770    -0.875    -0.668
 alveolar voiced    -0.746    -0.847    -0.649
 velar    voiced    -0.730    -0.825    -0.633
 bilabial voiceless  1.172     1.044     1.301
 alveolar voiceless  1.195     1.088     1.299
 velar    voiceless  1.213     1.113     1.307

Point estimate displayed: median 
HPD interval probability: 0.95

Is there a way to fix this? P.S. I'm using contr.sum and not sure if this is relevant to calculating the emmeans for these contrasts.

Many thanks in advance!

rvlenth commented 2 years ago

As documented, emmeans constructs a reference grid comprising all combinations of factor levels and specified covariate levels. So in this case, that would include all six combinations of your two factors.

You don't show the code for the model you fitted, but I surmise that your model formula had poa + voicing, without the interaction poa:voicing. When you have an additive model like that, it is possible to estimate the mean for all six factor combinations. Had you included the interaction in the model, that nonexistent combination would have been flagged NonEst (non-estimable) in the output.

Dallak commented 2 years ago

Thanks for this helpful input! This is correct. Here is the model structure:

  var ~ position*voicing*target_vowel+poa+rep+
    (1+position*voicing*target_vowel+poa| subject) +
    (1+position| word)

rvlenth / emmeans

emmeans for nonexistent variables #373