Open diaorch opened 9 months ago
I expect that you'd want this at initialization, but I don't have the background to know when else a different parametrization would be useful. Can you help me understand such a scenario?
Thank you for getting back to me!
Here's a quick summary of what parameterizations we are discussing: the current parameterization of negative binomial distribution as implemented uses r
and p
, which have specific meanings in terms of Bernoulli trials. The alternative parameterization I am interested in uses mean and dispersion as parameters, which is described in more detail in the last two paragraphs of "Alternative formulation" of negative binomial distribution on Wikipedia page. Also, I appreciate the warning in the current documentation to note carefully the meaning of the parameters.
The mean-and-dispersion is more widely used in regression analysis, because the explanatory variables can be linked to the mean, similar to linear regression. This is also, in my opinion, more intuitive when it comes to interpreting the effects of explanatory variables on the negative binomial counts. (Personally, I don't think I have seen negative binomial regression done with the r
-and-p
parameterization.) In my work, having the mean in parameterization also helps me compare a negative binomial distribution with a Poisson distribution of the same mean, since I can more quickly evaluate the dispersion level in the negative binomial distribution. I would also imagine that the mean-and-dispersion parameterization is more useful outside of a frequentist statistics context.
The parameters, or statistics like means and variances, can be derived using the current parameterization. In my own codes, I just wrote a separate struct
that "wraps" the calls to the statrs::distribution::NegativeBinomial
but with calculations to convert the r
-and-p
parameterization to the mean-and-dispersion parameterization. I imagine that something similar can be done natively within statsrs
, which will make the use of negative binomial distribution more consistent, and it shouldn't break anything. I also understand that it may not be a priority, since conversion on the user end, such as the one I have done, is reasonably doable.
This "Alternative Parameterization" usually called the "Method of moments estimator". This parameterization exists for many distributions, so in my opinion it is rather a trait.
Hmm, were it a trait, how would you define that, i.e. what methods and generics would go with it?
I don't really think such a feature is one of the highest priorities, in any case I think the implementation should look like the following:
trait StandardizedMoments {
fn mean(&self) -> Option<f64>;
fn variance(&self) -> Option<f64>;
fn skewness(&self) -> Option<f64>;
fn kurtosis(&self) -> Option<f64>;
}
trait MethodOfMomentsEstimator {
fn from_moments<M: StandardizedMoments>(moments: M) -> Result<Self>;
}
Skewness & kurtosis here as negative binomial distribution (not only this distribution) can be fitted not only to fist 2 moments, but to first 4 moments. There are multiple ways how to fit data to distributions.
I see this as a way to refer to distributions or estimators with constraints by their moments instead of other parameters. I was expecting something narrower, things that would specify distributions given family and number of moments that fully specify the parametrization of the distribution.
I see usefulness in thinking about moments by specifying mean [and variance [and skew...]] , (notation is bash-style optionals) but I think these are little far off from where the crate is now
Not opposed to them, but think they would take a significant amount of work.
@diaorch how close is this to what you were wanting?
The discussion brings up several good points. Firstly, I agree that it's likely not of the highest priority for the crate right now.
Secondly, I wasn't considering generalizing to the Method of Moments Estimator, but I agree that if we are going with the Method of Moments Estimator in general, a trait would be a better choice than, say, a method specific to a distribution struct
. As for how exactly the trait should work, I have to admit it's a bit beyond my ability for me to confidently conclude the discussion.
Hope this is still helpful.
@YeungOnion I don't have ready to implement API design. Maybe instead of trait it could be a builder pattern... this way set of parameters can be effectively constrained if, for instance, set of variance will return an "extended" builder.
As an additional example – Gamma distribution can be parametrized by:
@diaorch that's alright by me, but I think as a user you have some great input for how you'd like it to work as a library despite implementation. Could you share what the struct you wrapped the negative binomial distribution in?
@tessob yeah, I think it will take some digging, perhaps as its own feature request. I also think a builder pattern would work well, as it seems like there are a few things that can come out depending on what info is specified, assuming all have specified the family of distribution,
Thoughts?
@YeungOnion hey sorry for the very late reply and thank you for the appreciation for input from a user like me. Here are the codes I used as a wrapper to use the alternative parameterization for its probability mass function. Some of the ideas are taken from R source codes. Hope they are useful for showing what I was describing as the alternative parameterization and how I was using it.
impl NegativeBinomialAlt {
pub fn new(mu: f64, r: f64) -> Result<NegativeBinomialAlt> {
if mu.is_nan() || r.is_nan() || mu < 0. || r <= 0. {
Err(StatsError::BadParams)
} else {
Ok(NegativeBinomialAlt { mu, r })
}
}
// probability mass function
pub fn pmf(&self, x: usize, ln_trans: bool) -> f64 {
let ln_res: f64;
// handling zero counts, to be accurate for n << mu and n >> mu
// based on R source codes
if x == 0 {
if self.r < self.mu {
ln_res = (self.r / (self.r + self.mu)).ln() * self.r;
} else {
ln_res = (-1. * self.mu / (self.r + self.mu)).ln_1p() * self.r;
}
} else {
// parameterization prob. = r / (r + mu)
let p = self.r / (self.r + self.mu);
// handling of when p close to 1
// when p is 1, the NB is a point mass at zero
// however, == with floating points is not trustworthy,
// in this case, i specifically want to handle the case with
// p as f64 is exacly 1 causing NaN, therefore == is used here
if p == 1. {
if x == 0 {
ln_res = (1_f64).ln();
} else {
ln_res = (0_f64).ln();
if !ln_trans {
return 0_f64;
}
}
} else {
// if x << r, use Martin Maechler, June 2008 formula cited
// in R source code dnbinom()
if (x as f64) < 1e-10 * self.r {
let q: f64;
if self.r < self.mu {
q = (self.r / (1. + self.r / self.mu)).ln();
} else {
q = (self.mu / (1. + self.mu / self.r)).ln();
}
let x_f = x as f64;
ln_res = x_f * q - self.mu - gamma::ln_gamma(x_f + 1.) + (x_f * (x_f + 1.) / (2. * self.r)).ln_1p();
} else {
let nb = NegativeBinomial::new(self.r, p).unwrap();
ln_res = nb.ln_pmf(x.try_into().expect("statrs::NegativeBinomial calc PMF of u64, found usize and failed try_into()."));
}
}
}
if ln_trans {
return ln_res;
} else {
return ln_res.exp();
}
}
}
Let me know if there is anything else I can do to help and I will do my best to be less than months late this time. :)
One idea on how to implement this pattern would be with generics.
So e.g. NegativeBinomial
currently looks like this:
struct NegativeBinomial {
r: f64,
p: f64,
}
We could move the parametrization out into a wrapper struct:
struct NegativeBinomial<P> {
param: P
}
// naming TBD
struct SuccessProbability {
r: f64,
p: f64,
}
struct MeanVariance {
m: f64, // μ
v: f64, // σ²
}
impl NegativeBinomial<SuccessProbability> {
pub fn new_success_probability(r: f64, p: f64) -> Result<Self, Error>;
pub const fn p(&self) -> f64 { self.param.p }
pub const fn r(&self) -> f64 { self.param.r }
// maybe return Result, if conversion can fail
pub fn into_mean_variance(self) -> NegativeBinomial<MeanVariance>;
}
impl NegativeBinomial<MeanVariance> {
pub fn new_mean_variance(m: f64, v: f64) -> Result<Self, Error>;
pub const fn m(&self) -> f64 { self.param.m }
pub const fn v(&self) -> f64 { self.param.v }
pub fn into_success_probability(self) -> NegativeBinomial<SuccessProbability>;
}
impl DiscreteCDF<u64, f64> for NegativeBinomial<SuccessProbability> {
// implement functions for r, p
}
impl DiscreteCDF<u64, f64> for NegativeBinomial<MeanVariance> {
// implement functions for m, v
}
// etc.
This would mean that we need to implement traits for both NegativeBinomial<SP>
and NegativeBinomial<MV>
(and other potential parametrizations) separately. But I think it's fair to assume that different parametrizations will usually use different formulas[^1]? Not 100% sure, but that's easily solved by a shared function which is called by both implementations.
Anyways, from a user's point of view this should mean that when you type NegativeBinomial::new
into your IDE, it should recommend the new
function for both parametrizations. Most API would stay the same because of the implemented traits (min, max, mean, variance, mode, cdf, sf, etc.) with some obvious differences (I can't call nb.p()
on a NegativeBinomial<MeanVariance>
).
The documentation would need to explain this well. And it would add a generic parameter to the type, which makes things more complicated (often you don't need to explicitly type your variables, this is Rust after all, but for beginners it can get confusing).
As an additional example – Gamma distribution can be parametrized by:
Mean only – and it will be identical to exponential distribution. Mean and variance – using method of moments from Wiki. Mean, variance with skewness and/or kurtosis – using numerical optimizer.
I think a builder would be great for this. You should even be able to handle different types.
pub struct GammaBuilder<P> {
p: P
}
impl<P> GammaBuilder<P> {
pub fn new(mean: f64) -> GammaBuilder<ExpBuilderParams> {
GammaBuilder {
p: ExpBuilderParams { mean }
}
}
}
impl GammaBuilder<ExpBuilderParams> {
pub fn variance(self, variance: f64) -> GammaBuilder<GammaBuilderParams> {
GammaBuilder {
GammaBuilderParams {
mean: self.mean,
variance,
}
}
}
pub fn build(self) -> Exp {
// ...
}
}
impl GammaBuilder<GammaBuilderParams> {
pub fn build(self) -> Gamma {
// ...
}
}
struct ExpBuilderParams {
mean: f64
}
struct GammaBuilderParams {
mean: f64,
variance: f64,
}
Needs a bit of polish and I can't sketch an API for the numerical optimizer
part since I don't know too much about that, but that should generally work, I think.
[^1]: i.e. f(m, v) is not just convert to r, p; then call g(r, p)
but instead a different formula altogether.
I think supporting construction by specifying moments would be useful and as general fitting/optimizing of distribution functions should be a feature of statrs
, we should aim to have helpful, unsurprising API to work with the popular optimizer crates.
I also think the builder is a good choice. It even extends well for fitting to data.
I've got other thoughts, but while writing it I realized we haven't defined if this project aims to be more for those writing single analyses or more as a library for applications / other library-like crates.
The notion of a Gamma distribution shouldn't rely on its parametrization, but numerically speaking a caller could opt into tradeoffs by choosing a parametrization that's closer to their use case. C.f. when running your own potentially exploratory analysis, you might just expect numerical stability across parameters, so we would runtime select from suite of methods to account for broad parameter ranges.
Currently, the
struct NegativeBinomial
uses the parametersr
andp
, as in the interpretation of negative binomial distribution as the distribution of the number of failures in a sequence of Bernoulli trials with probability of successp
that continue untilr
successes occur.Another way to describe a negative binomial distribution is through the mean and the size/dispersion parameters. This is worth implementing especially given that a lot of the statistical questions center around the mean of the distribution.
r
andp
can be calculated from the mean and the dispersion parameter.Perhaps the alternative parameterization of negative binomial distributions can be a useful addition to the crate's functionality?