plot_predictions() does not maintain factor ordering #1109

andrewheiss commented 1 month ago

When using plot_predictions() with a model with an ordered factor outcome variable, the ordering is lost when plotting.

Here's a reprex:


categories <- c(
  "Strongly disagree",
  "Neither agree nor disagree",
  "Strongly agree"

df <- data.frame(
  answer = sample(categories, 500, replace = TRUE),
  fav_color = sample(c("red", "blue", "green"), 500, replace = TRUE)
df$answer <- factor(df$answer, levels = categories, ordered = TRUE)

model <- MASS::polr(answer ~ fav_color, data = df, Hess = TRUE)

These are in the right order because {marginaleffects} makes them that way internally, but the column is just a character, so the order is fragile:

preds <- avg_predictions(model)
#>                       Group Estimate Std. Error    z Pr(>|z|)     S 2.5 % 97.5 %
#>  Strongly disagree             0.192     0.0176 10.9   <0.001  89.5 0.157  0.226
#>  Disagree                      0.184     0.0173 10.6   <0.001  85.1 0.150  0.218
#>  Neither agree nor disagree    0.226     0.0187 12.1   <0.001 109.2 0.190  0.263
#>  Agree                         0.196     0.0177 11.0   <0.001  91.7 0.161  0.231
#>  Strongly agree                0.202     0.0179 11.2   <0.001  95.1 0.167  0.237
#> Columns: group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
#> Type:  probs

The group facets are in alphabetic order here:

plot_predictions(model, condition = "fav_color") +


We can re-specify the levels and order for group manually:

plot_predictions(model, condition = "fav_color") +
  facet_wrap(vars(factor(group, levels = categories)))


{dplyr} and {tibble} maintain the column class when grouping/summarizing/etc:

# dplyr keeps things as ordered factors internally

df1 <- df |> 
  group_by(answer) |> 
  summarize(n = n())
#> # A tibble: 5 × 2
#>   answer                         n
#>   <ord>                      <int>
#> 1 Strongly disagree             96
#> 2 Disagree                      92
#> 3 Neither agree nor disagree   113
#> 4 Agree                         98
#> 5 Strongly agree               101

#> [1] "ordered" "factor"

But maybe {data.table} or whatever {marginaleffects} is using behind the scenes doesn't do that? (or is philosophically opposed to doing that?; idk anything about {data.table}). Factors (and especially ordered factors) are weird and unwieldy.

vincentarelbundock commented 1 month ago

Thanks for the report. Could you please try version from Github?