Open ghost opened 3 years ago
Thanks for the feedback. The last one will be added soon. For the others would you be interested in doing a PR? can you give more context on why these are particularly useful compared to what we have now? Perhaps some examples where they naturally crop up.
Thanks for the feedback. The last one will be added soon. For the others would you be interested in doing a PR?
Unfortunately I'm pretty busy, but they should both be very easy. The scaled inverse chi-square can be implemented as one over a chi-squared distributed variable times a scaling parameter, or as an inverse gamma with parameters ν/2, ν*τ^2/2 (where ν is the degrees of freedom).
Hey! I can try to send an update for adding a mean and sample size parameterization for the Beta distribution. I'll look at the scaled inverse chi-square after that.
For the scaled inverse chi-squared should we create a new continous.ScaledInverseChiSquared
or provide some kind of an option in the current definitions of continous.InverseGamma
?
Is the alternative beta parametrization described here common?
Is the alternative beta parametrization described here common?
I find it to be useful in placing priors over parameters that are directly interpretable as probabilities and am currently working on models that use it. That way, setting n=10
and mu=0.9
is really easy to interpret as a belief with 10 pseudocounts' worth of strength that the value is 0.9.
Is the alternative beta parametrization described here common?
I find it to be useful in placing priors over parameters that are directly interpretable as probabilities and am currently working on models that use it. That way, setting
n=10
andmu=0.9
is really easy to interpret as a belief with 10 pseudocounts' worth of strength that the value is 0.9.
Yep, I love this interpretation of it. Another PyMC3-related advantage is that this doesn't have the funky problem where if you try to set a prior on sd
it's almost impossible, because sd
is bounded in a way that depends on mu
. Instead you can let n
do the job of handling how dispersed you want your distribution to be and set a prior on that, while mu
can be set using e.g. a logit regression.
If this is extended to the Beta Binomial, it also helps with underdispersed data. If you reparametrize the Beta Binomial as a function of n
, γ=(α+β)^-1
(one over the pseudocount which we were calling n
earlier), and p=mu/n=α/(α+β)
you can model underdispersed data by dropping the constraint that γ
must be positive. I would have found this useful today while I was trying to model an underdispersed count in my data.
Worth mentioning -- if we want to add parametrizations using a dispersion parameter that can be set to a negative value to model underdispersion, setting an allow_negative
flag that's False
by default might be a good idea: the underdispersed beta-binomial loses the physical interpretation that it's a binomial where the success probability is drawn from a beta. Given that underdispersion is rare in most data sets and most people using a beta-binomial are looking to model overdispersion, the default should be to disallow negative values. (It just happens that in some cases, you're adding anti-correlated trials so it's not actually wrong).
Is the alternative beta parametrization described here common?
I just spotted the reparametrization of the Beta distribution in terms of γ and p out in the wild on the Discourse, where it was used to improve inference dramatically (~300 divergences out of 1000 down to 0). I think it's a good idea to add it as another parametrization for the beta and beta-binomial (and allow negative values of γ for the beta-binomial by setting a flag).
For the scaled inverse chi-squared should we create a new
continous.ScaledInverseChiSquared
or provide some kind of an option in the current definitions ofcontinous.InverseGamma
?
Sorry I missed this. I feel like ScaledInverseChiSquared is quite a mouthful/hard to type. Since InverseGamma is the more popular name, I would stick with that.
I frequently find myself missing one way or another to parametrize a distribution. Examples of this include: