Subgroup analysis on a factor with levels identical to feature levels produces wrong estimates

Moved from @m-jankowski at #13:

This is also a problem when using mm_diffs().

In my case, I wanted to conduct a subgroup analysis conditional on the gender of the respondents (labeled as "Male" or "Female"). The conjoint experiment, however, also contained levels with these labels ("Male" and "Female"). Particularly problematic is that mm_diffs() did not throw an error message, but returned wrong estimates without any warnings.

Here is an artificial example using the immigration data:

# Data
data("immigration")

# Create subgroups
immigration$ethnosplit <- cut(immigration$ethnocentrism, 2)

# Rename subgroup levels
immigration$subgroup <- as.factor(ifelse(as.numeric(immigration$ethnosplit) == 1, 
                                          "Female", 
                                          "Male"))

# Estimate correct MMs by subgroup
mm_correct <- cj(na.omit(immigration),
                 ChosenImmigrant ~ Gender + Education + LanguageSkills,
                 estimate = "mm",
                 id = ~ CaseID, 
                 by = ~ ethnosplit)

plot(mm_correct,
     group = "ethnosplit",
     vline = 0.5)

# Differences between subgroups

mmdiff_correct <- mm_diffs(na.omit(immigration), 
                   ChosenImmigrant ~ Gender + Education + LanguageSkills,
                   id = ~ CaseID, 
                   by = ~ ethnosplit)

plot(mmdiff_correct)

# Using subgroups with identical level names returns wrong estimates

mmdiff_problem <- mm_diffs(na.omit(immigration),
                   ChosenImmigrant ~ Gender + Education + LanguageSkills,
                   id = ~ CaseID, 
                   by = ~ subgroup)

plot(mmdiff_problem)

leeper / cregg

Subgroup analysis on a factor with levels identical to feature levels produces wrong estimates #22