paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.27k stars 182 forks source link

Dirichlet and Multinomial Models #463

Closed paul-buerkner closed 5 years ago

paul-buerkner commented 6 years ago

The Dirichlet distribution is a multivariate generalization of the beta distribution and may not only be used as prior for some parameters but also as response distribution if the response is a probability vector of more than two categories. I am not yet sure about an appropriate interface for this model, because it requires multiple columns as response input, but maybe I get an idea at some point. The implementation can otherwise closely follow that of the categorical distribution.

jamesrrae commented 6 years ago

Great. This would address the issue I raised in this feature request, no?:

https://github.com/paul-buerkner/brms/issues/396

I guess the situation I'm interested in is what to do when one has multiple outcome variables that are all probability vectors.

paul-buerkner commented 6 years ago

Not exactly. The Dirichlet distribution is not a multivariate distribution of correlated (in the sense of a correlation matrix) beta variables. Rather the dirichlet distribution is to beta what categorical is to bernoulli and neither of them is multivariate in the sense of #396.

jamesrrae commented 6 years ago

That's too bad. I was hopeful that it may have resolved the issue that comes up in data I come across pretty frequently (i.e., multiple beta variables)!

paul-buerkner commented 6 years ago

I think a reasonable way to specify the response of a Dirichlet model would be to use a matrix column in the data. For instance

A <- rbind(
  c(0.2, 0.3, 0.5),
  c(0.8, 0.1, 0.1)
)
df <- data.frame(x = rnorm(2)
df$A <- A
brm(A ~ x, data = df, family = Dirichlet())
paul-buerkner commented 6 years ago

The same structure could be used to implement multinomial models. Updating the title accordingly.

paul-buerkner commented 6 years ago

I thinks this may not be the right place to ask since this is about multinomial responses. I would you suggest, you first look at the doc of ?set_prior first. If you still have questions, ask at https://discourse.mc-stan.org/

paul-buerkner commented 5 years ago

Multinomial and Dirichlet models can now be estimated in the GitHub version of brms.

szuniga07 commented 5 years ago

Thank you very much for the update, Paul.

Best wishes to you in the new year.

Sincerely, Steve

On January 30, 2019, at 6:22 AM, Paul-Christian Bürkner notifications@github.com wrote:

Multinomial and Dirichlet models can now be estimated in the GitHub version of brms.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/paul-buerkner/brms","title":"paul-buerkner/brms","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/paul-buerkner/brms"}},"updates":{"snippets":[{"icon":"PERSON","message":"@paul-buerkner in #463: Multinomial and Dirichlet models can now be estimated in the GitHub version of brms."}],"action":{"name":"View Issue","url":"https://github.com/paul-buerkner/brms/issues/463#issuecomment-458961100"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/paul-buerkner/brms/issues/463#issuecomment-458961100", "url": "https://github.com/paul-buerkner/brms/issues/463#issuecomment-458961100", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Robinlovelace commented 4 years ago

Just a follow-up on this, I have an application for Dirichlet regression: estimating the % of trips in cities, zones and desire lines made by different modes of transport such as car, bike, walk.

Just discovered that you can do it in a Bayesian framework with brms, great work @paul-buerkner! FYI the minimal example I put together based on the code above was as follows:

# install.packages("brms")
library(tidyverse)
library(brms)
A = rbind(
  c(0.2, 0.3, 0.5),
  c(0.8, 0.1, 0.1)
)
df = data.frame(x = rnorm(2))
df$A = A
m = brm(A ~ x, data = df, family = dirichlet())

res  = predict(m, data.frame(x = 0:3))
res_matrix = res[, 1, ]
rowSums(res_matrix)
res_df = as.data.frame(res_matrix) %>% 
  mutate(x = 0:3) %>% 
  pivot_longer(matches("[0-3]"))
ggplot(res_df) +
  geom_area(aes(x, value, fill = name))

Which generated this:

image

Out of interest, would you say that this is a good approach to tackle the issue of predicting mode splits with confidence intervals? See here for some city data I've been looking at, would be great to get a solid predictive model of cycling so that you could predict the % increase with changes to multiple parts of transport systems:

https://github.com/ATFutures/who3/tree/master/scenarios

szuniga07 commented 4 years ago

Thank you for this. Very much appreciated.

Best, Steve

On November 10, 2019, at 2:05 PM, Robin notifications@github.com wrote:

Just a follow-up on this, I have an application for Dirichlet regression: estimating the % of trips in cities, zones and desire lines made by different modes of transport such as car, bike, walk.

Just discovered that you can do it in a Bayesian framework with brms, great work @paul-buerkner! FYI the minimal example I put together based on the code above was as follows:

install.packages("brms") library(tidyverse) library(brms) A = rbind( c(0.2, 0.3, 0.5), c(0.8, 0.1, 0.1) ) df = data.frame(x = rnorm(2)) df$A = A m = brm(A ~ x, data = df, family = dirichlet()) res = predict(m, data.frame(x = 0:3)) res_matrix = res[, 1, ] rowSums(res_matrix) res_df = as.data.frame(res_matrix) %>% mutate(x = 0:3) %>% pivot_longer(matches("[0-3]")) ggplot(res_df) + geom_area(aes(x, value, fill = name))

Which generated this:

Out of interest, would you say that this is a good approach to tackle the issue of predicting mode splits with confidence intervals? See here for some city data I've been looking at, would be great to get a solid predictive model of cycling so that you could predict the % increase with changes to multiple parts of transport systems:

https://github.com/ATFutures/who3/tree/master/scenarios

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/paul-buerkner/brms/issues/463?email_source=notifications\u0026email_token=ACFSULI2HYOIGSYE4IHBVP3QTCAT3A5CNFSM4FG3LMOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDVIGAI#issuecomment-552239873", "url": "https://github.com/paul-buerkner/brms/issues/463?email_source=notifications\u0026email_token=ACFSULI2HYOIGSYE4IHBVP3QTCAT3A5CNFSM4FG3LMOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDVIGAI#issuecomment-552239873", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

paul-buerkner commented 4 years ago

Please ask brms related questions on https://discourse.mc-stan.org/ using the Interfaces - brms tag.