[FR] Alternative parametrisations for numerical stability (e.g. for NegativeBinomial distribution)

tillahoffmann commented 3 years ago

Some of the distributions provide multiple different parametrisation options. For example, the MultivariateNormal distribution can be parametrised in terms of covariance, precision, or Cholesky decomposition of the covariance matrix. However, irrespective of the parametrisation used, the distribution is ultimately parametrised using the Cholesky decomposition.

This approach works well in principle but can suffer from numerical issues. For example, consider the NegativeBinomial distribution with concentration a and rate b. In the limit a / b ** 2 -> 0, the NegativeBinomial distribution should reduce to the Poisson distribution with rate a / b. Of course, this limit is problematic in practice because of the limits of floating point arithmetic. Stan has implemented multiple different parametrisations for the negative-binomial distribution (see here for details).

Is there interest in implementing different versions of distributions that could benefit from alternative parametrisations? If yes, should they reuse the interface already used by the MultivariateNormal distribution? Using a similar approach to Stan's neg_binom_2 is also an option but seems clunky.

Note: this may be a discussion for torch rather than pyro to get alignment on interface (if this is at all desirable).

fritzo commented 3 years ago

👍 Hi @tillahoffmann, I think it's a great idea for Pyro and NumPyro to offer multiple parameterizations of distributions for purposes of numerical stability and interpretability. As you note we already do this for e.g. Categorical (probs vs logits) and MultivariateNormal (covariance_matrix vs scale_tril vs precision_matrix). We also provide an an alternative for NegativeBinomial, but it is called GammaPoisson.

Here are some design opinions :smile:

We've found in NumPyro and Funsor that it is cleaner to offer multiple distribution classes, rather than to try to offer multiple alternative kwargs in a single distribution's .__init__() method. This is because distribution classes also provide metadata about their parameters, e.g. in the .arg_constraints class attribute, and because we sometimes want to pattern match on distributions to transform them by deconstructing and reconstructing them (e.g. in Pyro or in Funsor).
It would be great if, among a set of equivalent parameterizations, each class's docstring linked to each of the other parameterizations' class. E.g. we should cross-link NegativeBinomial and GammaPoisson's docstrings. We might even note special cases, e.g. Gamma(1, rate) is equivalent to Exponential(rate).

I've tagged this issue help wanted to encourage contributions to this issue, either feature requests or PRs.

fehiepsi commented 3 years ago

For context, @dirmeier implemented BetaProportion, NegativeBinomial2 in NumPyro as alternative parameterizations of Beta and NegativeBinomial.

tillahoffmann commented 3 years ago

Nice to see those distributions in numpyro already!

In terms of an interface for the end user, having a single entrypoint with different keyword arguments seems preferable over having to know the different distribution names (e.g. NegativeBinomial2 is a hard term to search for unless you know what you're looking for).

Could a possible approach be to have different classes for different distributions (as @fritzo suggested) but still offer a single interface to users, extending this idea?

def NegativeBinomial(*, total_count=None, probs=None, logits=None, conc=None, rate=None,
                     mean=None, variance=None, validate_args=None):
    """
    TODO: docstring explaining the different parametrisations.
    """
    # TODO: Validate that the arguments are not conflicting...
    # Pick the desired parametrisation
    if total_count is not None and probs is not None:
        return NegativeBinomialProbs(total_count, probs, validate_args=validate_args)
    elif total_count is not None logits is not None:
        return NegativeBinomialLogits(total_count, logits, validate_args=validate_args)
    elif conc is not None and rate is not None:
        return GammaPoisson(conc, rate, validate_args=validate_args)
    elif mean is not None and rate is not None:
        return NegativeBinomial2(mean, rate, validate_args=validate_args)
    else:
        raise ValueError("One of `probs` or `logits` must be specified.")

fehiepsi commented 3 years ago

I can see that the checks would be complicated. I would prefer using a single name for particular parameterization and cross-link to other parameterizations as @fritzo suggested. In NumPyro, we create a common interface just for backward compatibility.

NegativeBinomial2 is a hard term to search for unless you know what you're looking for

Agreed. I think we should add a line to NegativeBinomial: Also see ... for an alternative parameterization withmeanand ....

pyro-ppl / pyro

[FR] Alternative parametrisations for numerical stability (e.g. for NegativeBinomial distribution) #2915