ordinal and multinomial models: allowing specifying/collapsing categories

bwiernik commented 2 years ago

I'm starting to use your package a lot and loving it!

One thing I would find extremely useful would be to be able to specify specific categories/groups in categorical/ordinal/multinomial models, as well as to be able to collapse the selected categories.

For example, let's say I've got a cumulative ordinal model where the response is numbered 1:5 for Strongly Disagree to Strongly Agree. I very often was predictions or contrasts for this response for "Answered 4 or 5" (Agree or Strongly Agree). Currently, I compute these manually by:

estimating epred draws from my brmsfit model,
filtering to the selected categories
summing together predicted probabilities for the selected categories within each data row and draw,
computing contrasts between the summed probabilities within each draw,
summarizing the contrasts across draws

It would be really awesome if I could specify something like comparisons(..., categories = c(4, 5), collapse_categories = TRUE) to have the filtering and collapsing done before the contrasts are computed

vincentarelbundock commented 2 years ago

Glad to hear that!

This is already possible using the hypothesis argument. See vignette here: https://vincentarelbundock.github.io/marginaleffects/articles/hypothesis.html

In the examples below, note that:

The hypothesis argument does not return intervals for bayesian models. I opened an issue to fix this: #446
I set newdata="mean". This means that all variables are held at their means or modes. You can of course change that argument to suit your needs.

Is this more or less what you had in mind?

Maybe we can think of a better interface to the hypothesis argument for this common task?

library(MASS)
library(marginaleffects)

tmp <- tempfile()
download.file("https://github.com/vincentarelbundock/modelarchive/raw/main/data/brms_cumulative_random.rds", tmp)
mod <- readRDS(tmp)

# predictions for each level
predictions(mod, newdata = "mean")
#>   rowid     type group   predicted     conf.low  conf.high treat period
#> 1     1 response     1 0.788658815 0.3890123064 0.97150120     0      0
#> 2     1 response     2 0.199259206 0.0269274956 0.54827775     0      0
#> 3     1 response     3 0.008050615 0.0007565728 0.05011209     0      0
#> 4     1 response     4 0.003144665 0.0002470459 0.02245343     0      0
#>   subject
#> 1       1
#> 2       1
#> 3       1
#> 4       1

# mean predicted outcome for levels 2 and 4
predictions(
    mod,
    newdata = "mean",
    hypothesis = c(0, .5, 0, .5))
#>   rowid     type   term predicted treat period subject
#> 1     1 response custom 0.1012019     0      0       1

# same as above but with string formula
predictions(
    mod,
    newdata = "mean",
    hypothesis = "(b2 + b4) / 2 = 0")
#>   rowid     type        term predicted treat period subject
#> 1     1 response (b2+b4)/2=0 0.1012019     0      0       1

# Is the average of predictions in groups 1 and 2 different from the average of
# predictions in groups 3 and 4?
predictions(
    mod,
    newdata = "mean",
    hypothesis = "(b1 + b2) / 2 = (b3 + b4) / 2")
#>   rowid     type                term predicted treat period subject
#> 1     1 response (b1+b2)/2=(b3+b4)/2 0.4883614     0      0       1

bwiernik commented 2 years ago

I'm not sure what to do with hypothesis to compute "p(4 or 5) in treatment group - p(4 or 5) in control group"

vincentarelbundock commented 2 years ago

First we compute predicted outcome for each level, in each treatment group, just to know in which rows the interesting values are located:

predictions(mod, newdata = datagrid(treat = 0:1))
#>   rowid     type group   predicted     conf.low   conf.high period
#> 1     1 response     1 0.788658815 3.890123e-01 0.971501197      0
#> 2     2 response     1 0.930939856 6.813507e-01 0.992919563      0
#> 3     1 response     2 0.199259206 2.692750e-02 0.548277747      0
#> 4     2 response     2 0.065748443 6.900328e-03 0.299353050      0
#> 5     1 response     3 0.008050615 7.565728e-04 0.050112092      0
#> 6     2 response     3 0.002287907 1.911668e-04 0.015888530      0
#> 7     1 response     4 0.003144665 2.470459e-04 0.022453426      0
#> 8     2 response     4 0.000883685 6.149795e-05 0.006967036      0
#>   subject treat
#> 1       1     0
#> 2       1     1
#> 3       1     0
#> 4       1     1
#> 5       1     0
#> 6       1     1
#> 7       1     0
#> 8       1     1

Then, we add a hypothesis:

predictions(
   mod,
   newdata = datagrid(treat = 0:1),
   hypothesis = "mean(c(b6 + b8)) - mean(c(b5, b7)) = 0")
#>       type                            term    predicted
#> 1 response mean(c(b6+b8))-mean(c(b5,b7))=0 -0.002426048

The b* shortcuts refer to row names in the data frame produced by predictions(), when we don’t use hypothesis.

I’ll think about a better interface and am open to suggestions (ideally expanding hypothesis, but could introduce a new argument if absolutely necessary).

bwiernik commented 2 years ago

Oh, okay, so that's what the b* parameters are.

I am frequently working with response variables with varying numbers of levels (2, 3, 5, 6, 7, etc.) and wanting to compute contrasts of 3–30 treatment groups against a control group. So, I guess I could compute a hypothesis matrix programattically with something like:

variable_levels <- 1:5
levels_to_keep <- 4:5
ntreat <- 15

hyp <- do.call(
  "rbind", 
  lapply(variable_levels, \(i) if (i %in% keep_levels) rbind(-1, diag(ntreat)) else matrix(0, nrow = ntreat + 1, ncol = ntreat))
)

marginaleffects::predictions(model = mod, newdata = newdat, hypothesis = hyp)

Being able to specify that just by providing the levels_to_keep information would be really helpful.

What do you think is the timeline on adding intervals for Bayesian models with hypothesis?

Another thing that would be great is if the term column in the resulting predictions data.frame would contain the column names from the hypothesis matrix.

vincentarelbundock commented 2 years ago

A cleaner alternative might be to combine the by argument with hypothesis. The by argument allows us to compute the average prediction in subgroups of the data. For example:

predictions(mod, by = "treat")
#>       type treat predicted
#> 1 response   0.5 0.2497124
#> 2 response  -0.5 0.2478118

Then, we can simply do:

predictions(mod, by = "treat", hypothesis = "b1 = b2")
#>       type  term   predicted
#> 1 response b1=b2 0.001900596

predictions(mod, by = "treat", hypothesis = "pairwise")
#>       type          term    predicted
#> 1 response Row 2 - Row 1 -0.001900596

If we enrich the by argument to allow different groupings over which we take the average, then the rest becomes trivial.

Maybe something like (pseudo-code):

predictions(
    mod,
    by = list(groupA = 1:10, groupB = 11:13),
    hypothesis = "pairwise")

I’m not sure exactly how hard it will be to get the intervals, but if it’s not so bad I expect it to be a question of days rather than weeks.

bwiernik commented 2 years ago

One thing I am seeing currently is that the get_predict.brmsfit() method does the summarization (median) in the first step (out <- apply(draws, c(2, 3), stats::median)). That should be reserved until the final step of the computations.

vincentarelbundock commented 2 years ago

Yes, but the draws are saved in an attribute called "posterior_draws", and that is what we almost always use to do the actual computations for bayesian models. The median is just used for one column of the predictions() function.

bwiernik commented 2 years ago

Currently, in get_hypothesis(), the computations are done directly on the predicted column and the posterior_draws attribute isn't used.

So it looks like something like this should work to get draws, median, and intervals for Bayesian models:

draws <- attr(x, "posterior_draws") |> 
  apply(2, \(x) x %*% hypothesis) 

out <- draws |>  apply(1, ggplot2::median_hilow) |> data.table::rbindlist()

vincentarelbundock commented 2 years ago

Sounds good. Not sure if/how this should be integrated with get_ci(), which allows us to use HDI our ETI

https://github.com/vincentarelbundock/marginaleffects/blob/main/R/get_ci.R#L60

bwiernik commented 2 years ago

yeah definitely, whatever approach you use to get the median and CI from draws. I just grabbed my go-to function when I was prototyping

draws <- attr(x, "posterior_draws") |> 
  apply(2, \(x) x %*% hypothesis) 

out <- draws |>  apply(1, ggplot2::median_hilow) |> data.table::rbindlist()

hypothesis names <- colnames(hypothesis)
if (is.null(hypothesis_names)) hypothesis_names <- "custom"
out <- get_ci_draws(x = data.frame(term = hypothesis_names), conf_level = .95, draw = draws, estimate = "predicted")

vincentarelbundock commented 2 years ago

With current github main, we now get credible intervals and we can add column names to the hypothesis matrix to automatically rename the tests.

library(marginaleffects)

tmp <- tempfile()
download.file("https://github.com/vincentarelbundock/modelarchive/raw/main/data/brms_cumulative_random.rds", tmp)
mod <- readRDS(tmp)

The next command gives medians of the posterior distribution of predictions – for each level of the response – when all covariates are held at their means, except the treatment variable treat:

p <- predictions(
   mod,
   variables = list(treat = 0:1),
   newdata = "mean")
p
#>   rowid rowidcf     type group   predicted     conf.low   conf.high period subject treat
#> 1     1       1 response     1 0.788658815 3.890123e-01 0.971501197      0       1     0
#> 2     2       1 response     1 0.930939856 6.813507e-01 0.992919563      0       1     1
#> 3     1       1 response     2 0.199259206 2.692750e-02 0.548277747      0       1     0
#> 4     2       1 response     2 0.065748443 6.900328e-03 0.299353050      0       1     1
#> 5     1       1 response     3 0.008050615 7.565728e-04 0.050112092      0       1     0
#> 6     2       1 response     3 0.002287907 1.911668e-04 0.015888530      0       1     1
#> 7     1       1 response     4 0.003144665 2.470459e-04 0.022453426      0       1     0
#> 8     2       1 response     4 0.000883685 6.149795e-05 0.006967036      0       1     1

apply(attr(p, "posterior_draws"), 1, median)
#> [1] 0.788658815 0.930939856 0.199259206 0.065748443 0.008050615 0.002287907 0.003144665 0.000883685

The next command computes contrasts between the predictions above:

lincom <- matrix(
    c(-1, 1,  0, 0, rep(0, 4),
       0, 0, -1, 1, rep (0, 4)),
    ncol = 2,
    dimnames = list(NULL, c("Row 2 minus Row 1", "Row 4 minus Row 3")))

predictions(
   mod,
   hypothesis = lincom,
   variables = list(treat = 0:1),
   newdata = "mean")
#>   rowid rowidcf     type              term  predicted    conf.low   conf.high period subject treat
#> 1     1       1 response Row 2 minus Row 1  0.1376634  0.02029289  0.31812249      0       1     0
#> 2     2       1 response Row 4 minus Row 3 -0.1285724 -0.27812930 -0.01937696      0       1     1

Note that taking the median of the posterior of differences is very slightly different than taking the difference between the medians of posterior predictions. The order of operations matters a bit:

diff(apply(attr(p, "posterior_draws"), 1, median))[c(1, 3)]
#> [1]  0.1422810 -0.1335108

Also note that you will probably want to normalize the +1s and +1s in your hypothesis matrix if you are collapsing several estimates, since the hypothesis weights are used to compute a simple linear combination of predictions, and:

1:4 %*% c(-1, -1, 1, 1)
#>      [,1]
#> [1,]    4
1:4 %*% c(-0.5, -0.5, .5, .5)
#>      [,1]
#> [1,]    2

Finally, collapsing different treatment levels is now super easy, but I have not yet found an element way to collapse reponse levels. I’ll keep thinking about it.

bwiernik commented 2 years ago

Awesome!

Maybe a groups argument that takes a hypothesis matrix for the response levels? So I can do c(0, 0, 0, 1, 1) so that I can do "add up groups 4 and 5"?

Or this could even be accepted by the existing transformation argument?

vincentarelbundock commented 2 years ago

The by argument can already compute the average of predicted outcomes by subgroups of predictors. For example:

library(MASS)
library(marginaleffects)

mod <- polr(factor(gear) ~ hp + factor(cyl), mtcars)
predictions(mod, by = "cyl", type = "probs")
#> 
#> Re-fitting to get Hessian
#>    type cyl predicted    std.error
#> 1 probs   6 0.3333333 3.810461e-12
#> 2 probs   4 0.3333333 1.056994e-12
#> 3 probs   8 0.3333333 1.251927e-12

So this is easy to do for predictors. For response levels, one intriguing possibility would be to pass a data.frame with a special column name to the by argument. Then we could do (pseudo-code):

by <- data.frame(
    group = c(4, 6, 8),
    by = c("Group A", "Group A", "Group B"))

predictions(
    mod,
    by = by,
    hypothesis = "reference")

bwiernik commented 2 years ago

I don't want the average of subgroups of predictors, I want the sum of two levels of the response. group is the name you used for the column indicating the category in an ordinal or categorical response model.

My goal is to compute the combined percentage of responding eg either Agree or Strongly Agree on a Likert scale

vincentarelbundock commented 2 years ago

Yes, I understand. In my last pseudo-code example, you would collapse the response levels 4 and 6 (group A), and compare them to response level 8 (group B).

bwiernik commented 2 years ago

predictions(
    mod,
    by = data.frame(group = rep(4:5, 11), by = rep(0:11, each = 2))
    hypothesis = "reference")

That would compare 4+5 for each group 1-10 against control (0). Works for me

vincentarelbundock commented 2 years ago

This should now work for both frequentist and bayesian models. Let me know if you run into issues when you try it.

library(nnet)
library(marginaleffects)
mod <- multinom(factor(gear) ~ mpg + am * vs, data = mtcars, trace = FALSE)

predictions(mod, type = "probs") |> head()
#>   rowid  type group    predicted    std.error gear  mpg am vs
#> 1     1 probs     3 3.623918e-05 2.002490e-03    4 21.0  1  0
#> 2     2 probs     3 3.623918e-05 2.002490e-03    4 21.0  1  0
#> 3     3 probs     3 9.347603e-08 6.911938e-06    4 22.8  1  1
#> 4     4 probs     3 4.044657e-01 1.965452e-01    3 21.4  0  1
#> 5     5 probs     3 9.999714e-01 1.246217e-03    3 18.7  0  0
#> 6     6 probs     3 5.183336e-01 2.898025e-01    3 18.1  0  1

predictions(mod, type = "probs", by = "group")
#>    type group predicted  std.error
#> 1 probs     3 0.4687522 0.04042542
#> 2 probs     4 0.3750010 0.06141754
#> 3 probs     5 0.1562469 0.04624265

by <- data.frame(
    group = c("3", "4", "5"),
    by = c("3,4", "3,4", "5"))

predictions(mod, type = "probs", by = by)
#>    type predicted  std.error  by
#> 1 probs 0.4218766 0.02312133 3,4
#> 2 probs 0.1562469 0.04624265   5

predictions(mod, type = "probs", by = by, hypothesis = "sequential")
#>    type    term  predicted  std.error
#> 1 probs 5 - 3,4 -0.2656297 0.06936398

bwiernik commented 2 years ago

@vincentarelbundock You've got a browser() left in average_draws() on master.

vincentarelbundock commented 2 years ago

ooops, sorry about that. should be fixed now. I added a test to catch this.

bwiernik commented 2 years ago

@vincentarelbundock Sorry if I'm being thick, but I'm not sure I understand the by syntax. Could you show me an example with this model for differences in "proportion of gear = 4/5" for cyl = 6 vs cyl = 4 and cyl = 8 vs cyl = 4?

library(nnet)
library(marginaleffects)
mod <- multinom(factor(gear) ~ mpg + factor(cyl), data = mtcars, trace = FALSE)

vincentarelbundock commented 2 years ago

@bwiernik I’m not sure I understand exactly what you mean. But here are a few options using the very latest version from Github:

library(nnet)
library(marginaleffects)
nom <- multinom(factor(gear) ~ mpg + factor(cyl), data = mtcars, trace = FALSE)

predictions(nom, by = c("group", "cyl"), hypothesis = "sequential")
#>    type      term   predicted std.error
#> 1 probs 3,4 - 3,6 -0.19430915 0.1890545
#> 2 probs 3,8 - 3,4  0.76527848 0.1248376
#> 3 probs 4,6 - 3,8 -0.28557415 0.2075463
#> 4 probs 4,4 - 4,6  0.15566583 0.2292554
#> 5 probs 4,8 - 4,4 -0.72669195 0.1338590
#> 6 probs 5,6 - 4,8  0.14263407 0.1321941
#> 7 probs 5,4 - 5,6  0.03864332 0.1757888
#> 8 probs 5,8 - 5,4 -0.03858654 0.1478195

by <- expand.grid(
    group = 3:5,
    cyl = c(4, 6, 8),
    stringsAsFactors = FALSE) |>
    # define labels
    transform(by = ifelse(
        group %in% 3:4,
        sprintf("3/4 Gears & %s Cylinders", cyl),
        sprintf("5 Gears & %s Cylinders", cyl)))

predictions(nom, by = by)
#>    type predicted  std.error                      by
#> 1 probs 0.4285648 0.06606515 3/4 Gears & 6 Cylinders
#> 2 probs 0.4092432 0.05797517 3/4 Gears & 4 Cylinders
#> 3 probs 0.4285364 0.04584405 3/4 Gears & 8 Cylinders
#> 4 probs 0.1428704 0.13213030   5 Gears & 6 Cylinders
#> 5 probs 0.1815137 0.11595035   5 Gears & 4 Cylinders
#> 6 probs 0.1429271 0.09168810   5 Gears & 8 Cylinders

predictions(nom, by = by, hypothesis = "pairwise")
#>     type                                              term     predicted
#> 1  probs 3/4 Gears & 4 Cylinders - 3/4 Gears & 6 Cylinders -1.932166e-02
#> 2  probs 3/4 Gears & 8 Cylinders - 3/4 Gears & 6 Cylinders -2.839251e-05
#> 3  probs   5 Gears & 6 Cylinders - 3/4 Gears & 6 Cylinders -2.856945e-01
#> 4  probs   5 Gears & 4 Cylinders - 3/4 Gears & 6 Cylinders -2.470511e-01
#> 5  probs   5 Gears & 8 Cylinders - 3/4 Gears & 6 Cylinders -2.856377e-01
#> 6  probs 3/4 Gears & 8 Cylinders - 3/4 Gears & 4 Cylinders  1.929327e-02
#> 7  probs   5 Gears & 6 Cylinders - 3/4 Gears & 4 Cylinders -2.663728e-01
#> 8  probs   5 Gears & 4 Cylinders - 3/4 Gears & 4 Cylinders -2.277295e-01
#> 9  probs   5 Gears & 8 Cylinders - 3/4 Gears & 4 Cylinders -2.663160e-01
#> 10 probs   5 Gears & 6 Cylinders - 3/4 Gears & 8 Cylinders -2.856661e-01
#> 11 probs   5 Gears & 4 Cylinders - 3/4 Gears & 8 Cylinders -2.470227e-01
#> 12 probs   5 Gears & 8 Cylinders - 3/4 Gears & 8 Cylinders -2.856093e-01
#> 13 probs     5 Gears & 4 Cylinders - 5 Gears & 6 Cylinders  3.864332e-02
#> 14 probs     5 Gears & 8 Cylinders - 5 Gears & 6 Cylinders  5.678502e-05

bwiernik commented 2 years ago

Ah, thanks, I see what the by column is for now!

One thing I notice is that if there are rows in the predictions that don't belong to any row of by, they are returned collapsed into a by = NA row. That should probably be filtered out.

library(nnet)
library(marginaleffects)
nom <- multinom(factor(gear) ~ mpg + factor(cyl), data = mtcars, trace = FALSE)

predictions(nom, by = c("group", "cyl"), hypothesis = "sequential")
#>    type      term   predicted std.error  statistic      p.value   conf.low
#> 1 probs 3,4 - 3,6 -0.19430915 0.1890545 -1.0277945 3.040465e-01 -0.5648491
#> 2 probs 3,8 - 3,4  0.76527848 0.1248376  6.1301914 8.777340e-10  0.5206013
#> 3 probs 4,6 - 3,8 -0.28557415 0.2075463 -1.3759540 1.688359e-01 -0.6923574
#> 4 probs 4,4 - 4,6  0.15566583 0.2292554  0.6790061 4.971340e-01 -0.2936666
#> 5 probs 4,8 - 4,4 -0.72669195 0.1338590 -5.4287848 5.673903e-08 -0.9890509
#> 6 probs 5,6 - 4,8  0.14263407 0.1321941  1.0789742 2.805992e-01 -0.1164617
#> 7 probs 5,4 - 5,6  0.03864332 0.1757888  0.2198282 8.260050e-01 -0.3058963
#> 8 probs 5,8 - 5,4 -0.03858654 0.1478195 -0.2610381 7.940631e-01 -0.3283075
#>    conf.high
#> 1  0.1762308
#> 2  1.0099557
#> 3  0.1212091
#> 4  0.6049982
#> 5 -0.4643330
#> 6  0.4017298
#> 7  0.3831830
#> 8  0.2511344

by <- expand.grid(
  group = 3:5,
  cyl = c(4, 6, 8),
  stringsAsFactors = FALSE) |>
  # define labels
  transform(by = ifelse(
    group %in% 3:4,
    sprintf("3/4 Gears & %s Cylinders", cyl),
    sprintf("5 Gears & %s Cylinders", cyl))) |> 
  subset(group %in% 3:4)

predictions(nom, by = by)
#>    type predicted  std.error statistic      p.value  conf.low conf.high
#> 1 probs 0.4285648 0.06606515  6.487003 8.756078e-11 0.2990795 0.5580501
#> 2 probs 0.4092432 0.05797517  7.058938 1.677800e-12 0.2956139 0.5228724
#> 3 probs 0.4285364 0.04584405  9.347700 8.957431e-21 0.3386837 0.5183891
#> 4 probs 0.1561788 0.06350874  2.459171 1.392583e-02 0.0317040 0.2806537
#>                        by
#> 1 3/4 Gears & 6 Cylinders
#> 2 3/4 Gears & 4 Cylinders
#> 3 3/4 Gears & 8 Cylinders
#> 4                    <NA>

predictions(nom, by = by, hypothesis = "pairwise")
#>    type                                              term     predicted
#> 1 probs 3/4 Gears & 6 Cylinders - 3/4 Gears & 4 Cylinders  1.932166e-02
#> 2 probs 3/4 Gears & 6 Cylinders - 3/4 Gears & 8 Cylinders  2.839251e-05
#> 3 probs                      3/4 Gears & 6 Cylinders - NA  2.723860e-01
#> 4 probs 3/4 Gears & 4 Cylinders - 3/4 Gears & 8 Cylinders -1.929327e-02
#> 5 probs                      3/4 Gears & 4 Cylinders - NA  2.530643e-01
#> 6 probs                      3/4 Gears & 8 Cylinders - NA  2.723576e-01
#>    std.error     statistic     p.value    conf.low conf.high
#> 1 0.08789439  0.2198281561 0.826004991 -0.15294817 0.1915915
#> 2 0.08041288  0.0003530841 0.999718280 -0.15757795 0.1576347
#> 3 0.11053165  2.4643255884 0.013727135  0.05574792 0.4890240
#> 4 0.07390977 -0.2610381284 0.794063104 -0.16415376 0.1255672
#> 5 0.10961879  2.3085850516 0.020966618  0.03821542 0.4679132
#> 6 0.09906106  2.7493910457 0.005970611  0.07820148 0.4665137

^{Created on 2022-09-07 by the reprex package (v2.0.1)}

vincentarelbundock commented 2 years ago

Hmm, let me think about this. My gut reaction is that it's easy enough for the user to filter out, and that there may be some diagnostic value in leaving this in: in some cases, users may not realize that their by data frame is incomplete.

bwiernik commented 2 years ago

That's fair enough, just wanted to be sure it was an intentional choice

vincentarelbundock commented 2 years ago

I didn't say it was intentional ;)

bwiernik commented 2 years ago

One place where post-filtering becomes difficult is when hypothesis is given. Then na.omit() or friends won't work. Perhaps an argument to remove by = NA from the predictions before the hypothesis test?

vincentarelbundock commented 2 years ago

Opened a separate issue to think about this specific problem: https://github.com/vincentarelbundock/marginaleffects/issues/481

bwiernik commented 2 years ago

Hold on, by still isn't doing what I am looking for.

I want to sum across the levels of group, not average across them. Above, for each value of cyl, the predictions for group = 3 and group = 4 are being averaged. I want them to be summed.

So the expected results should have predicted values:

# A tibble: 3 × 2
  cyl   predicted
  <fct>     <dbl>
1 4         0.867
2 6         0.852
3 8         0.660

The context here is that I want predicted to mean prob(y = Agree OR y = StronglyAgree), so predicted = prob(y = Agree) + prob(y = StronglyAgree).

vincentarelbundock / marginaleffects

ordinal and multinomial models: allowing specifying/collapsing categories #445