avg_comparisons() behaviour for brms ordinal (cumulative(logit)) models when using type = "link" and type = "response"

mz555 commented 2 months ago

Hello. This is less a "bug" and more "i'm unsure what is being reported by the output".

I have the following model:

bANCOVA.suit2 <- brm(SuitableAfter ~ Crime * Disclosure + mo(SuitableB4), 
                     data = Disclosure_data, 
                     family = cumulative("logit"))

I then try to get pairwise contrasts for the 3 levels of Crime, split by Disclosure. But I want these to be averaged over the levels of the ordinal outcome (I would treat this as continuous / latent scale; while I normally use probit for this, I think reporting ORs would be useful for my audience).

This code seems to do it, but I don't know what scale the output is one, as it seems to allow negative values (so log odds ratios?). This might be right, and what I need, but not on the OR scale.

# {marginaleffects} with type = "link"
avg.suit <- avg_comparisons(bANCOVA.suit2,
                            variables = list(Crime = "revpairwise"), 
                            by = "Disclosure", 
                            type = "link",
                            comparison = "ratio")
print(avg.suit, nrows = 100)

Output:

> print(avg.suit, nrows = 100)

  Term              Contrast Disclosure Estimate    2.5 % 97.5 %
 Crime mean(CSA) / mean(R)       Disc     0.6981  -1.5320   3.51
 Crime mean(CSA) / mean(R)       NoDisc  -0.0975 -14.1551  11.19
 Crime mean(CSA) / mean(IIC)     Disc     0.5256   0.0829   1.40
 Crime mean(CSA) / mean(IIC)     NoDisc  -0.1762 -11.9239  11.14
 Crime mean(R) / mean(IIC)       Disc     0.7286  -0.2401   1.57
 Crime mean(R) / mean(IIC)       NoDisc   0.8304  -4.3736   6.48

These are essentially the comparisons I think I need, but I would like these as odds ratios. But these seem (based on the neg values) to be log ORs? And using exp^ on them doesn't seem to yield the same outcome as what emmeans does (when i treat the covariate as continuous).

If I use the type = "response" then the output is by thresholds only.

# {marginaleffects} with type = "response"
avg.suit.resp <- avg_comparisons(bANCOVA.suit2,
                            variables = list(Crime = "revpairwise"), 
                            by = "Disclosure", 
                            type = "response",
                            comparison = "ratio")
print(avg.suit.resp, nrows = 100)

Output (reduced for space):

> print(avg.suit.resp, nrows = 100)

 Group  Term              Contrast Disclosure Estimate 2.5 % 97.5 %
     1 Crime mean(CSA) / mean(R)       Disc      1.344 0.490  3.635
     1 Crime mean(CSA) / mean(R)       NoDisc    2.074 0.915  4.649
     1 Crime mean(CSA) / mean(IIC)     Disc      2.064 0.766  5.487
     1 Crime mean(CSA) / mean(IIC)     NoDisc    2.238 1.075  4.625
     1 Crime mean(R) / mean(IIC)       Disc      1.545 0.591  3.962
     1 Crime mean(R) / mean(IIC)       NoDisc    1.076 0.477  2.442
     2 Crime mean(CSA) / mean(R)       Disc      1.195 0.627  2.314
     2 Crime mean(CSA) / mean(R)       NoDisc    1.160 0.893  1.787
...
7 Crime mean(R) / mean(IIC)       Disc      0.639 0.242  1.721
     7 Crime mean(R) / mean(IIC)       NoDisc    0.912 0.330  2.428

In essence, I want these but averaged/collapsed over the thresholds/values of the outcome variable.

I tried to compare with emmeans, but that throws an error for some unknown reason if my model uses mo() as a covariate:

> emmeans(bANCOVA.suit2, specs = pairwise ~Crime | Disclosure,
+         type = "response")
Error: Invalid values in variable 'SuitableB4': '5.62204724409449', '5.62204724409449', '5.62204724409449', '5.62204724409449', '5.62204724409449', '5.62204724409449'
In addition: Warning messages:
1: brms' emmeans support for ordinal models is experimental and currently ignores the threshold parameters

Data is attached, as it might be specific to it (i tried a simpler toy dataset and emmeans worked with it, so that might be another matter). Crime is a factor (3 levels), Disclosure is a factor (2 levels), both are "between subjects". Suitableb4 is ordinal (likert type 7-points), taken at the start of the experiment. SuitableAfter is ordinal (likert type 7-points) taken at the end of the experiment.

The issue might be a minor one, but it is beyond my level of understanding of cumulative models and link functions. I would be very grateful for any guidance on this.

For context, Solomon Kurz has a blog post that gets very close to what I need, where he plots the output of such (probit) models as averaged ratings per group (male vs female) https://solomonkurz.netlify.app/blog/2021-12-29-notes-on-the-bayesian-cumulative-probit/#add-a-predictor but I would prefer a script that didn't require per model/design tweaking.

Thank you in advance for your help! Mircea

mz555 commented 2 months ago

Small update, the emmeans issue was due to me not declaring the ordinal variables as "ordered = TRUE" when formatting the data before compiling the model. Once you do this, emmeans computes comparisons without issue. A question that I had while playing with the type of comparisons in emmeans is if on the logit scale it makes more sense to look at difference (x1 - x2) instead of ratios (x1/x2)?

vincentarelbundock commented 2 months ago

What do you mean by this, exactly?

In essence, I want these but averaged/collapsed over the thresholds/values of the outcome variable.

As you know, predicted probabilities in this model are outcome-level specific (and sum to 1), so ratios will be outcome-level specific too. Reporting these ratios separately is the correct and informative thing to do...

I guess I don't understand why you would want to average them, exactly what enters that average, and what quantity you expect to come out of the averaging. What is your specific estimand? Can you define it using standard math notation?

mz555 commented 2 months ago

Hi Vincent. Thanks for looking into this.

It might be that ratios do not make sense for ordinal models, I am not sure. I just expect to be able to say "on average, people in Condition 1 were more likely to perceive the stimuli as unpleasant (lower ratings) than in Condition 2, OR = XX". Typically, with probit models I can just take the difference (x1-x2) for two conditions, as the scale is standardised (so has a easy interpretations for psychologists).

My thinking was from using the probit versions, where you can think of the outcome as a latent decision space. Thus, you can average over the thresholds to obtain the "average rating" (or more specifically, the decision criterion) for that group/condition.

Kurz makes this point for probit models saying "When you have an ordinal model, the mean of the criterion variable is the sum of the p_k probabilities multiplied by the k-th values of the criterion." He has it expressed as follows (see link in previous post):

$$ \mathbb{E}(\text{rating}) = \sum_1^K p_k \times k, $$

This gives you the mean rating for the model (intercept only). With models that have a predictor, this extends to:

$$ \begin{align*} p(\text{rating} = k | \{ \tau_k \}, {\color{pink}{\mu_i}} ) & = \Phi(\tau_k {\color{pink}{- \mu_i}} ) - \Phi(\tau_{k - 1} {\color{pink}{- \mu_i}} ) \\ {\color{pink}{\mu_i}} & = {\color{pink}{\beta_0 + \sum_1^l \beta_l x_l}} \\ {\color{pink}{\beta_0}} & = {\color{pink}0} , \end{align*} $$

The end results from such an approach is obtaining the equivalent of a standardised effect (a la Cohen's d) for the latent space. In my view, if the researcher's intuition is that the ordinal scale used is measuring a latent trait (e.g., attractiveness) then treating it as roughly continuous post-modelling makes sense and is easier to communicate. But, I take your point that this would not be appropriate if you just have an ordered categorical outcome, like in a clinical context based on an assessment (e.g., depression: mild < strong < severe).

My question for the avg_comparisons() is what is being reported (and on what scale) when I use type = "link" versus type = "response"? Is it possible (without me directly wrangling the posteriors) to obtain something like an "average rating" per condition, and then either the difference in conditions, or ratio of this difference?

mz555 commented 2 months ago

Sorry. I clicked the wrong button and closed the issue.

vincentarelbundock commented 2 months ago

Ah ok. Thanks for clarifying. I think I get what you mean now.

On scale, what you need to know is that when you use type="link", marginaleffects internally calls rstantools::posterior_linpred()

https://www.rdocumentation.org/packages/rstantools/versions/2.4.0/topics/posterior_linpred

When you requestion type response, it calls this one instead:

https://mc-stan.org/rstantools/reference/posterior_epred.html

The scale that these returns will depend on your model. You can read the corresponding documentation to figure it out.

With respect to your "average effect" query, you are fitting an ordered logit model, presumably because you don't think the outcome levels can be treated as equidistant. But then you want to create a one-number summary that will (implicitly) re-impose the equidistant assumption by giving equal weight to each level-specific estimate. This feels like you are just ruining all the good work you've done at the model fitting stage.

It is almost certainly possible to achieve this using posterior_draws() to extract posterior draws and then processing that. But I just think it's a bad idea, so I won't build a shortcut or helped function to make this easier.

If you really need a one-number summary of the effect, I'd fit a simpler model and report a simple marginal (average) risk difference or ration.

If you think the outcome scale is really not equidistance, then I'd bite the bullet and interpret it "correctly" on the response scale, with level-specific estimates.

(Incidently, I would also avoid OR which, I'd argue, nobody actually understands.)

mz555 commented 2 months ago

Thanks for clarifying which function is being called. I remember the huge threads on twitter by Andrew Heiss and Isabella Ghement on how linpread vs epred handle non-linear transformations and such, so I think I understand now.

I do agree with your view that converting the model output to such a metric is unwise. One of the reasons I started using brms was to tackle the "use t-tests for ordinal data" dogma in my field. But, it does still leave a question for me (and other psychologists) as to what to do with ordinal data that has many thresholds.

In psych, it is common (and recommended) to use 7-point scale, and even 11-point scales (to avoid the error of interpolation). The distribution of proportions of such scales is rarely "normal", but I don't know how to analyse this data optimally, as interpreting 6 or 10 thresholds is impossible.

Anyway, that isn't for this forum. I just wanted to provide some context to my reasoning. Thanks again for clarifying!

vincentarelbundock commented 2 months ago

Yeah, the tradeoffs are real. Communicating these results can be tough. Good luck!

vincentarelbundock / marginaleffects

avg_comparisons() behaviour for brms ordinal (cumulative(logit)) models when using type = "link" and type = "response" #1187