leeper / cregg

Simple Conjoint Analyses, Tidying, and Visualization
Other
50 stars 7 forks source link

Subgroup analysis on a factor with levels identical to feature levels produces wrong estimates #22

Closed leeper closed 5 years ago

leeper commented 5 years ago

Moved from @m-jankowski at #13:

This is also a problem when using mm_diffs().

In my case, I wanted to conduct a subgroup analysis conditional on the gender of the respondents (labeled as "Male" or "Female"). The conjoint experiment, however, also contained levels with these labels ("Male" and "Female"). Particularly problematic is that mm_diffs() did not throw an error message, but returned wrong estimates without any warnings.

Here is an artificial example using the immigration data:

# Data
data("immigration")

# Create subgroups
immigration$ethnosplit <- cut(immigration$ethnocentrism, 2)

# Rename subgroup levels
immigration$subgroup <- as.factor(ifelse(as.numeric(immigration$ethnosplit) == 1, 
                                          "Female", 
                                          "Male"))

# Estimate correct MMs by subgroup
mm_correct <- cj(na.omit(immigration),
                 ChosenImmigrant ~ Gender + Education + LanguageSkills,
                 estimate = "mm",
                 id = ~ CaseID, 
                 by = ~ ethnosplit)

plot(mm_correct,
     group = "ethnosplit",
     vline = 0.5)

image

# Differences between subgroups

mmdiff_correct <- mm_diffs(na.omit(immigration), 
                   ChosenImmigrant ~ Gender + Education + LanguageSkills,
                   id = ~ CaseID, 
                   by = ~ ethnosplit)

plot(mmdiff_correct)

image

# Using subgroups with identical level names returns wrong estimates

mmdiff_problem <- mm_diffs(na.omit(immigration),
                   ChosenImmigrant ~ Gender + Education + LanguageSkills,
                   id = ~ CaseID, 
                   by = ~ subgroup)

plot(mmdiff_problem)

image

leeper commented 5 years ago

This can probably be solved by specifying left-hand-side assignments that use the feature name and the level name rather than just the level name (which have been incorrectly assumed to be unique): https://github.com/leeper/cregg/blob/master/R/mm_diffs.R#L51-L90