Closed inti closed 7 years ago
"It does not do anything" is not very specific. What does brms complain about?
sorry, I did not mean to be unhelpful. it literally does not complain. this a full example I have brms Version: 1.1.0 Date: 2016-10-11
> library(RCurl)
> library(bitops)
> url <- 'https://gist.githubusercontent.com/inti/02eaf3916908b3307e369b7fe5684ad3/raw/f183c49ca5ecd362ab49597d4b32c3859ae74048/counts.txt'
> df <- getURL(url)
> counts = read.table(text = df,header = T)
> library(brms)
> counts$n = counts$alleleOne + counts$alleleTwo
> mod1 = brm(alleleOne/n ~ 1 + (1|featureName),data=counts,family = Beta())
>
I was using RStudio, if I do it on the shell it does complain Error: The beta distribution requires responses between 0 and 1.
should the total sample size be provided as an offset? or is there another way to specify it?
If there are outcomes that are exactly zero or one, brms has to throw this error as zeros and ones are not working with the beta distribution in Stan (only everything in between works). I wonder however, if a simple binomial model might be appropriate for your data. You could try
mod1 = brm(alleleOne | trials(n) ~ 1 + (1|featureName),
data=counts, family = binomial())
ok. Actually the full hierarchical model is: observations A/(A+B) are binomial and within each featureName they are modeled using a Beta, so it is beta binomial distribution. Makes sense? I have coded it before in Stan and I am trying to build it with brms.
Each featureName_{j}
has its own Beta(alpha_{j},beta_{j})
, p_{ij} ~Beta(alpha_{j},beta_{j}
and #alleleOne ~Binomial(n_i,p_{ij})
Is there a way to hack this beta-binomial in brms?
thanks again!
Unfortunately not. This model does not fit nicely in the usual regression framework, where the mean (p
for binomial models) is modeled via inv_link(linear_predictor)
. One could think of adding a family beta_binomial
that allows to estimate alpha
and beta
, but I am not sure how much sense that makes.
After furthing thinking about this, I am convincent that the beta-binomial family can be reasonably implemented in brms. However, I wouldn't use the shape1
, shape2
parameterization of the beta distribution but the mean
, precision
parameterization, instead. This fits much more nicely into the existing framework. I put the beta-binomial family on the list in #110 so that I don't forget about it.
I will close this issue here to keep the issue tracker clear. If you feel that there is more to discuss regarding the issue feel free to reopen it.
The beta-binomial is often chosen when overdispersion is an issue. Formally, overdispersion can be thought of as unexplained variance, and is therefore the rule rather than exception. Thinking about it that way, there is a rather straight-forward solution: adding an observation-level random effect. This "pulls" the unexplained variance into the linear predictor of the binomial (=logistic) model, where it is assumed to be normally distributed. In practice, the beta-binomial and the logit-normal distributions are very similar. It also allows to compare degree of variations, as they are all on the same scale.
You do it by adding an identifier for observations to the data set, say ObsID and then specify the model as:
mod1 = brm(alleleOne | trials(n) ~ 1 + (1|featureName) + (1|ObsID), data=counts, family = binomial())
Dunno whether this is valid in the context of genetics, though ...
Thanks for the comment, I guess it would need to be mod1 = brm(alleleOne | trials(n) ~ 1 + (1|featureName/ObsID), data=counts, family = binomial())
since ObsID are nested within featureName.
I think you are right in terms of how to handle the extra variation. The models are not fully equivalent but it might do.
Cheers, Inti
On Nov 22, 2016, at 08:34, schmettow notifications@github.com wrote:
The beta-binomial is often chosen when overdispersion is an issue. Formally, overdispersion can be thought of as unexplained variance, and is therefore the rule rather than exception. Thinking about it that way, there is a rather straight-forward solution: adding an observation-level random effect. This "pulls" the unexplained variance into the linear predictor of the binomial (=logistic) model, where it is assumed to be normally distributed. In practice, the beta-binomial and the logit-normal distributions are very similar. It also allows to compare degree of variations, as they are all on the same scale.
You do it by adding an identifier for observations to the data set, say ObsID and then specify the model as:
mod1 = brm(alleleOne | trials(n) ~ 1 + (1|featureName) + (1|ObsID), data=counts, family = binomial())
Dunno whether this is valid in the context of genetics, though ...
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/144#issuecomment-262218495, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXApXSFFojun3Zmni_XKtTwclLzYtqOks5rAtNigaJpZM4KxyRf.
In your case, (1|featureName/ObsID)
and (1|featureName) + (1|ObsID)
will be equivalent since ObsID
is already unique across all feature names.
Cheers, Inti
On Nov 22, 2016, at 08:47, Paul-Christian Bürkner notifications@github.com wrote:
In your case, (1|featureName/ObsID) and (1|featureName) + (1|ObsID) will be equivalent since ObsID is already unique across all feature names.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/144#issuecomment-262220785, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXApczBnuH40Gkm3hQ0zR_MSPlHZyyEks5rAtY2gaJpZM4KxyRf.
Hi, I have some count data where I observe counts: for each level of groupID I obtain multiple pairs of observations #A and #B nested within groupID. I want to model average prob of observing A #A/(#A+#B) as a beta distribution and model the groupID as a random effect. I see there is a Beta() family on
brms
. I am writing my model asbut it does not do anything :(
mod1 = brm(alleleOne/n ~ 1 + (1|featureName),data=counts)
runs but it is quite what I want since it does not model the observation sample size (n=A+B)the data looks like this https://gist.github.com/inti/02eaf3916908b3307e369b7fe5684ad3
how do I give it the total number of observations for each pair A and B? why does it not run when I set the
beta()
familymany thanks in advance