ENH: special.nbdtr: consider `n=0` to be in the domain

pbrod commented 1 year ago

Describe your issue.

The negative binomial distribution with parameters n and p fails to calculate the pmf or cdf for the corner case when n=0. The pmf is defined recursively as

$P(k+1) = \frac{(1-p) P(k) (n+k)}{k+1}$ where $P(0) = p^n$

Thus if n=0 the pmf should give $P(0)=1$ and $P(k)=0$ for $k>0$ whereas the scipy.stats implementation only returns NaNs.

Reproducing Code Example

import scipy.stats as ss
import numpy as np

ss.nbinom(n=0, p=1).pmf(np.arange(3))
ss.nbinom(n=0, p=0.5).pmf(np.arange(3))
ss.nbinom(n=0, p=0.1).cdf(np.arange(3))

Error message

array([nan, nan, nan])

SciPy/NumPy/Python version and system information

1.10.0 1.23.5 sys.version_info(major=3, minor=10, micro=9, releaselevel='final', serial=0)
lapack_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/pab/AppData/Local/anaconda3_2023\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pab/AppData/Local/anaconda3_2023\\Library\\include']
lapack_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/pab/AppData/Local/anaconda3_2023\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pab/AppData/Local/anaconda3_2023\\Library\\include']
blas_mkl_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/pab/AppData/Local/anaconda3_2023\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pab/AppData/Local/anaconda3_2023\\Library\\include']
blas_opt_info:
    libraries = ['mkl_rt']
    library_dirs = ['C:/Users/pab/AppData/Local/anaconda3_2023\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pab/AppData/Local/anaconda3_2023\\Library\\include']
Supported SIMD extensions in this NumPy install:
    baseline = SSE,SSE2,SSE3
    found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
    not found = AVX512F,AVX512CD,AVX512_SKX,AVX512_CLX,AVX512_CNL

mdhaber commented 1 year ago

I looked at a few references. Do you have one in mind?

If we think about the definition in terms of successes and failures, we can get into trouble. For instance, MathWorld's definition, which leads to the formula given in SciPy's documentation, is: You can see why it is problematic to talk about the probability of $-1$ successes in $0$ trials and success on the $0^\mbox{th}$ trial.

Since this is an edge case and others have thought carefully about it, I think we should decide which precedent to follow rather than going out on a limb.

R dbinom allows $n=0$.

dnbinom(0, 0, .5)
# 1
dnbinom(1, 0, .5)
# 0

Matlab defines it only for number of successes $R>0$. https://www.mathworks.com/help/stats/prob.negativebinomialdistribution.html It gives an error if you try to define a binomial distribution with $R=0$.

Mathematica's documentation suggests that the definition is only for strictly positive $n$, but it is not totally consistent about it.

You can see that the PDF defined below has $P(X = x) = 0$ for $x \geq 0$, yet the plot shows that the value is $1$ at $x=0$. ![image](https://github.com/scipy/scipy/assets/6570539/ff05ed2f-cba5-4f49-9734-71920bc5fc81) https://www.wolframalpha.com/input?i=NegativeBinomialDistribution%5B0%2C+0.5%5D Also from their documentation of `NegativeBinomialDistribution`, they include $n=0$ in the domain: ![image](https://github.com/scipy/scipy/assets/6570539/aea7e551-d180-4180-8534-ae36532ad62f)

SciPy's nbinom relies on Boost, which (apparently) defines the distribution only for strictly positive $n$.

My guess is that Boost chose not to provide a definition for $n = 0$. We can ask them to reconsider it, but it's not immediately obvious that it is incorrect not to define this distribution for $n=0$, so I'd consider this an enhancement (and maybe documentation) issue. I'll also rephrase the issue title to reflect that this issue is more nuanced.

pbrod commented 1 year ago

Most references seems to define the negative binomial distribution for $n>0$. However, the moment generating function (MGF) for the negative binomial distribution is $M_X(t) = \left(\frac{p}{1-(1-p) e^t}\right)^n$ for $t< -\log(1-p) $ and the MGF for the Poisson distribution is $MY(t) = e^{\lambda(e^{t}-1)}$ Now since $\lim{n \to 0} MX(t) = \lim{n \to 0} \left(\frac{p}{1-(1-p) e^t}\right)^n = 1$ is equal to $MY(t)|{\lambda=0} = e^{0}=1$ for all $t$, shows that the Poisson distribution with $\lambda=0$ is the limiting distribution of the negative binomial distribution as $n$ approaches zero. The pmf for the Poisson distribution with $\lambda=0$ is $P(0)=1$ and $P(k)=0$ for $k>0$.

mdhaber commented 1 year ago

@rkern do you have an opinion about whether $n=0$ should be considered to be in the domain of the negative binomial/Polya distribution?

pbrod commented 11 months ago

If we think about the definition in terms of successes and failures, we can get into trouble. For instance, MathWorld's definition, which leads to the formula given in SciPy's documentation, is: You can see why it is problematic to talk about the probability of −1 successes in 0 trials and success on the 0th trial.

I don't see this as problematic. If you have zero ($x+r=0$) trials, you have zero ($x=0$) failures and zero successes , so logically the negative binomial distribution should give $P(0)=1$ and $P(x)=0$ for $x>0$.

josef-pkt commented 11 months ago

does scipy already have other distributions that allow for parameters that result in degenerate (single masspoint) distribution? binom ?

For discrete distributions it might be ok because we already have masspoints. However, var -> 0 and skew and kurtosis divide by zero.

same if p -> 1

mdhaber commented 11 months ago

@pbrod The situation of 0 success and 0 failures in 0 trials makes sense. That corresponds with $r=1$. If we're talking about the negative binomial distribution, there has to be a success on the last trial, and in this case the last trial is the first trial. That's not what my comment was about.

If you plug in $r=0$ and $x=0$, this corresponds with -1 successes in -1 trials, and success on the zeroth trial (in a 1-indexed convention), that is, the trial before any trials have taken place. That does not make immediate sense to me.

@josef-pkt That can be tested.

import numpy as np
from scipy.stats import binom
dist = binom(n=0, p=0.5)
k = np.arange(5)
dist.pmf(k)  # array([1., 0., 0., 0., 0.])
dist.stats('mvsk')  # (0.0, 0.0, nan, -inf)

Apparently it is not a problem.

josef-pkt commented 11 months ago

nbinom is defined by using gamma function and not choose. We have an extension to continuous parameters.

n -> 0 works for n > 0, just plugging into current code (I'm currently still on scipy 1.7.3)

mdhaber commented 11 months ago

Yes. The statement is about interpretation in terms of integer successes in failures. The extension may be defined for $r=0$. If so, there should be a reference. I'm just saying that interpretation in terms of successes and failures becomes problematic.

josef-pkt commented 11 months ago

that interpretation also doesn't work if we have e.g. n=1.5 or n = 10.8750674865375

mdhaber commented 11 months ago

But those are within the domain according to a references. $r=0$ is not, or if it is, I haven't seen it. In lieu of that, I was just checking to see if it made sense in terms of the usual integer interpretation, and it does not. But all we need is a reference that shows the domain being for $r\geq0$, and this can be added, or we can ask Boost to add it.

josef-pkt commented 11 months ago

I just did a google search and did not find anything for the degenerate case.

to the interpretation wikipedia https://en.wikipedia.org/wiki/Negative_binomial_distribution#Alternative_formulations has several interpretations

from nbinom docstring "Negative binomial distribution describes a sequence of i.i.d. Bernoulli trials, repeated until a predefined, non-random number of successes occurs."

n is the required number of successes if n=0, we stop after finding zero successes, i.e. we stop immediately, and then the number of failures k is also 0, i.e. single masspoint at zero.

the problem is that if both n->0 and p->0, then they go in opposite direction and it is not clear whether it collapses to the masspoint at zero, but p > 0 is required in code and docstring.

mdhaber commented 11 months ago

OK, I can see it making some sense according to that wording, although I would usually trust what I find on Mathworld more than SciPy documentation. Perhaps @ev-br has an opinion based on https://github.com/scipy/scipy/pull/6562? I don't mean to get in the way of others submitting/reviewing/merging PRs on this topic.

ev-br commented 11 months ago

I don't think I've anything useful to say here, sorry.

pbrod commented 10 months ago

Experimenting with the nbinom also indicates that this is the correct limiting behavior of the negative binomial distribution as shown here:

>>> import scipy.stats as ss
>>> import numpy as np

>>> ss.nbinom(n=0.1, p=0.5).pmf(np.arange(3))
array([0.93303299, 0.04665165, 0.0128292 ])
>>> ss.nbinom(n=1e-10, p=0.5).pmf(np.arange(3))
array([1.00e+00, 5.00e-11, 1.25e-11])
>>> ss.nbinom(n=1e-250, p=0.5).pmf(np.arange(3))
array([1.00e+000, 5.00e-251, 1.25e-251])

>>> ss.nbinom(n=0.1, p=0.01).pmf(np.arange(3))
array([0.63095734, 0.06246478, 0.03401207])
>>> ss.nbinom(n=1e-10, p=0.01).pmf(np.arange(3))
array([1.0000e+00, 9.9000e-11, 4.9005e-11])
>>> ss.nbinom(n=1e-250, p=0.01).pmf(np.arange(3))
array([1.0000e+000, 9.9000e-251, 4.9005e-251])

>>> ss.nbinom(n=1e-250, p=1e-250).pmf(np.arange(3))
array([1.e+000, 1.e-250, 5.e-251])

>>> ss.nbinom(n=0.1, p=1).pmf(np.arange(3))
array([1., 0., 0.])
>>> ss.nbinom(n=1e-250, p=1).pmf(np.arange(3))
array([1., 0., 0.])

pbrod commented 10 months ago

The negative binomial distribution is a special case of the Katz distribution. Commonly also known as the Panjer or (a,b,0) class of distributions.

Katz studied the system of discrete distributions that satisfy the relationship

$P_K(k+1) / P_K(k) = \frac{\alpha + \beta k}{k+1} \quad \text{for} \quad k = 0, 1, 2, \ldots \qquad (1)$

where $P_K(0)=(1-\beta)^\frac{\alpha}{\beta}$, $\alpha \ge 0$ and $\beta<1$ and showed that there are only three solutions to Eq.1:

$\text{NegativeBinomial}(n=\alpha/\beta, p = 1-\beta) \quad \text{for} \quad 0< \beta < 1$, $\text{Poisson}(\lambda=\alpha) \quad \text{for} \quad \beta = 0 $ or $\text{Binomial}(n=-\alpha/\beta, p = \beta/(\beta-1)) \quad \text{for} \quad \beta < 0 $.

Katz originally truncated $P_K(k+j)$ to zero whenever $\alpha + \beta k<0$. However Washburn (1996, 2006)) recognized that $-\alpha/\beta$ actually must be an integer greater or equal to zero for the Binomial case.

The moment generating function (MGF) for the Katz distribution is $MX(t) = \left(\frac{1-\beta}{1-\beta e^t}\right)^{\alpha/\beta}$ for $t< -\log(\max(\beta, 0))$. For $\beta \to 0$ it is interpreted as $\lim{\beta \to 0}M_X(t) = e^{\alpha(e^t-1)}$

Now for $\beta \ne 0$ the limit of the MGF as $\alpha$ goes to zero is $\lim_{\alpha \to 0} MX(t) = \lim{\alpha \to 0} \left(\frac{1-\beta}{1-\beta e^t}\right)^{\alpha/\beta} = 1$ For $\beta = 0$ the limit of the MGF as $\alpha$ goes to zero is $\lim_{\alpha \to 0, \beta=0} MX(t) = \lim{\alpha \to 0} e^{\alpha(e^t-1)}=e^0= 1$ This result proves that the Katz distribution as $\alpha \xrightarrow[]{} 0$ is the same regardless of what the value of $\beta$ is.

Thus the pmf of the Katz distribution with $\alpha=0$ is $P_K(0)=1$ and $P_K(k)=0$ for $k>0$. Consequently the negative binomial distribution for $n=0$ should have the same pmf, which supports the conclusion in https://github.com/scipy/scipy/issues/18974#issuecomment-1655357122.

mdhaber commented 3 months ago

I thought I'd take another look here. The trouble before was the difference in the English definition of the distribution beween Mathematica, which I was citing, and SciPy.

Mathematica insists that the last trial is a success explicilty, and it phrases everything in terms of the number of trials $n-1$ before the last, so with $n = 0$ we start having to think about negative trials.
In SciPy's definition, the last trial is implicitly a success if trials are performed, but I don't think there is a problem if zero trials are performed.

I think SciPy's definition is valid, too, so I suppose I should roll with it rather than referring to Mathematica's.

With SciPy's wording, I think it makes sense that the number of failures before zero trials are performed is always zero and never a positive number; i.e. the PMF is a point mass with probability 1 at zero. Also, the limit of the PMF (written in terms of the gamma function) as n -> 0 seems to be defined and agree, and there is precedent for this behavior in R nbinom as mentioned in https://github.com/scipy/scipy/issues/18974#issuecomment-1653969196.

So I would be happpy to change the requirement that n > 0 to n >= 0 in the argcheck function so n = 0 would be included in the domain.

That said, there would be no immediate change in behavior because our PMF relies on Boost, and Boost's negative binomial PDF returns NaN at n=0. So if we wanted to see a change in behavior now, we'd need to make exceptions. We can do that within stats using _lazywhere, but that would introduce additional overhead in the calculation.

@steppi This issue proposes to define the PMF of the negative binomial distribtion with $n = 0$ successes as a point mass with probability 1 at $k = 0$ failures. I think this holds even for probability of success $p = 0$. Do you think this is something we would want to adjust in the wrappers, would you ask Boost for the change, or neither?

steppi commented 3 months ago

@steppi This issue proposes to define the PMF of the negative binomial distribtion with n = 0 successes as a point mass with probability 1 at k = 0 failures. I think this holds even for probability of success p = 0 . Do you think this is something we would want to adjust in the wrappers, would you ask Boost for the change, or neither?

If this is worth doing, my vote is for adjusting it in the wrappers. There's nothing wrong with Boost's convention, and it matches their documentation.

"For k + r Bernoulli trials each with success fraction p, the negative_binomial distribution gives the probability of observing k failures and r successes with success on the last trial. The negative_binomial distribution assumes that success_fraction p is fixed for all (k + r) trials."

It's also a lot easier in general for Boost users to write their own wrappers if they care about the r = 0 case, compared to SciPy users adjusting a scipy distribution.

mdhaber commented 3 months ago

Right, I know that their documentation currently mentions that the trial ends with a success, and so the change would not be consistent with it as written. Of course, that part of documentation does not mention non-integer n, either. They mention using the gamma function to extend the domain, so it would not be unreasonable to suggest that it be extended a little further.

But OK, if we should not trouble Boost with this, that's fine. I can say that I do not think it's worth changing in stats because it would add overhead in all cases to support an edge case that users can assess for themselves. Part of my question, then, was whether it is worth doing in special.

scipy / scipy