Closed paul-buerkner closed 5 years ago
Great. This would address the issue I raised in this feature request, no?:
https://github.com/paul-buerkner/brms/issues/396
I guess the situation I'm interested in is what to do when one has multiple outcome variables that are all probability vectors.
Not exactly. The Dirichlet distribution is not a multivariate distribution of correlated (in the sense of a correlation matrix) beta variables. Rather the dirichlet distribution is to beta what categorical is to bernoulli and neither of them is multivariate in the sense of #396.
That's too bad. I was hopeful that it may have resolved the issue that comes up in data I come across pretty frequently (i.e., multiple beta variables)!
I think a reasonable way to specify the response of a Dirichlet model would be to use a matrix column in the data. For instance
A <- rbind(
c(0.2, 0.3, 0.5),
c(0.8, 0.1, 0.1)
)
df <- data.frame(x = rnorm(2)
df$A <- A
brm(A ~ x, data = df, family = Dirichlet())
The same structure could be used to implement multinomial models. Updating the title accordingly.
I thinks this may not be the right place to ask since this is about multinomial responses. I would you suggest, you first look at the doc of ?set_prior
first. If you still have questions, ask at https://discourse.mc-stan.org/
Multinomial and Dirichlet models can now be estimated in the GitHub version of brms.
Thank you very much for the update, Paul.
Best wishes to you in the new year.
Sincerely, Steve
On January 30, 2019, at 6:22 AM, Paul-Christian Bürkner notifications@github.com wrote:
Multinomial and Dirichlet models can now be estimated in the GitHub version of brms.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/paul-buerkner/brms","title":"paul-buerkner/brms","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/paul-buerkner/brms"}},"updates":{"snippets":[{"icon":"PERSON","message":"@paul-buerkner in #463: Multinomial and Dirichlet models can now be estimated in the GitHub version of brms."}],"action":{"name":"View Issue","url":"https://github.com/paul-buerkner/brms/issues/463#issuecomment-458961100"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/paul-buerkner/brms/issues/463#issuecomment-458961100", "url": "https://github.com/paul-buerkner/brms/issues/463#issuecomment-458961100", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Just a follow-up on this, I have an application for Dirichlet regression: estimating the % of trips in cities, zones and desire lines made by different modes of transport such as car, bike, walk.
Just discovered that you can do it in a Bayesian framework with brms
, great work @paul-buerkner! FYI the minimal example I put together based on the code above was as follows:
# install.packages("brms")
library(tidyverse)
library(brms)
A = rbind(
c(0.2, 0.3, 0.5),
c(0.8, 0.1, 0.1)
)
df = data.frame(x = rnorm(2))
df$A = A
m = brm(A ~ x, data = df, family = dirichlet())
res = predict(m, data.frame(x = 0:3))
res_matrix = res[, 1, ]
rowSums(res_matrix)
res_df = as.data.frame(res_matrix) %>%
mutate(x = 0:3) %>%
pivot_longer(matches("[0-3]"))
ggplot(res_df) +
geom_area(aes(x, value, fill = name))
Which generated this:
Out of interest, would you say that this is a good approach to tackle the issue of predicting mode splits with confidence intervals? See here for some city data I've been looking at, would be great to get a solid predictive model of cycling so that you could predict the % increase with changes to multiple parts of transport systems:
Thank you for this. Very much appreciated.
Best, Steve
On November 10, 2019, at 2:05 PM, Robin notifications@github.com wrote:
Just a follow-up on this, I have an application for Dirichlet regression: estimating the % of trips in cities, zones and desire lines made by different modes of transport such as car, bike, walk.
Just discovered that you can do it in a Bayesian framework with brms, great work @paul-buerkner! FYI the minimal example I put together based on the code above was as follows:
Which generated this:

Out of interest, would you say that this is a good approach to tackle the issue of predicting mode splits with confidence intervals? See here for some city data I've been looking at, would be great to get a solid predictive model of cycling so that you could predict the % increase with changes to multiple parts of transport systems:
https://github.com/ATFutures/who3/tree/master/scenarios
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/paul-buerkner/brms/issues/463?email_source=notifications\u0026email_token=ACFSULI2HYOIGSYE4IHBVP3QTCAT3A5CNFSM4FG3LMOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDVIGAI#issuecomment-552239873", "url": "https://github.com/paul-buerkner/brms/issues/463?email_source=notifications\u0026email_token=ACFSULI2HYOIGSYE4IHBVP3QTCAT3A5CNFSM4FG3LMOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDVIGAI#issuecomment-552239873", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Please ask brms related questions on https://discourse.mc-stan.org/ using the Interfaces - brms tag.
The Dirichlet distribution is a multivariate generalization of the beta distribution and may not only be used as prior for some parameters but also as response distribution if the response is a probability vector of more than two categories. I am not yet sure about an appropriate interface for this model, because it requires multiple columns as response input, but maybe I get an idea at some point. The implementation can otherwise closely follow that of the
categorical
distribution.