scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
12.89k stars 5.12k forks source link

stats.distributions moment() test failing intermittently #2758

Closed rgommers closed 11 years ago

rgommers commented 11 years ago

Can't reproduce it, seen it on Travis only on Python 2.6 so far:

======================================================================
ERROR: test_continuous_basic.test_cont_basic(<scipy.stats.distributions.alpha_gen object at 0x3726dd0>, 0.5, (3.5704770516650459,), (0, 1), [<bound method alpha_gen.pdf of <scipy.stats.distributions.alpha_gen object at 0x3726dd0>>, <bound method alpha_gen.logpdf of <scipy.stats.distributions.alpha_gen object at 0x3726dd0>>, <bound method alpha_gen.cdf of <scipy.stats.distributions.alpha_gen object at 0x3726dd0>>, <bound method alpha_gen.logcdf of <scipy.stats.distributions.alpha_gen object at 0x3726dd0>>, <bound method alpha_gen.logsf of <scipy.stats.distributions.alpha_gen object at 0x3726dd0>>])
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/virtualenv/python2.6/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/travis/build/scipy/scipy/build/testenv/lib/python2.6/site-packages/scipy/stats/tests/test_continuous_basic.py", line 454, in check_named_args
npt.assert_equal(distfn.moment(3, *a, **k),
File "/home/travis/build/scipy/scipy/build/testenv/lib/python2.6/site-packages/scipy/stats/distributions.py", line 1702, in moment
val = _moment_from_stats(n, mu, mu2, g1, g2, self._munp, args)
File "/home/travis/build/scipy/scipy/build/testenv/lib/python2.6/site-packages/scipy/stats/distributions.py", line 387, in _moment_from_stats
mu3 = g1*(mu2**1.5) # 3rd central moment
OverflowError: (34, 'Numerical result out of range')
======================================================================

ERROR: test_continuous_basic.test_cont_basic(<scipy.stats.distributions.foldcauchy_gen object at 0x373e890>, 0.5, (4.7164673455831894,), (0, 1), [<bound method foldcauchy_gen.pdf of <scipy.stats.distributions.foldcauchy_gen object at 0x373e890>>, <bound method foldcauchy_gen.logpdf of <scipy.stats.distributions.foldcauchy_gen object at 0x373e890>>, <bound method foldcauchy_gen.cdf of <scipy.stats.distributions.foldcauchy_gen object at 0x373e890>>, <bound method foldcauchy_gen.logcdf of <scipy.stats.distributions.foldcauchy_gen object at 0x373e890>>, <bound method foldcauchy_gen.logsf of <scipy.stats.distributions.foldcauchy_gen object at 0x373e890>>])
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/virtualenv/python2.6/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/travis/build/scipy/scipy/build/testenv/lib/python2.6/site-packages/scipy/stats/tests/test_continuous_basic.py", line 454, in check_named_args
npt.assert_equal(distfn.moment(3, *a, **k),
File "/home/travis/build/scipy/scipy/build/testenv/lib/python2.6/site-packages/scipy/stats/distributions.py", line 1702, in moment
val = _moment_from_stats(n, mu, mu2, g1, g2, self._munp, args)
File "/home/travis/build/scipy/scipy/build/testenv/lib/python2.6/site-packages/scipy/stats/distributions.py", line 387, in _moment_from_stats
mu3 = g1*(mu2**1.5) # 3rd central moment
OverflowError: (34, 'Numerical result out of range')
ev-br commented 11 years ago

A shot in the dark: these are all discrete distributions; all tracebacks quote mu2**1.5; continuous ones have pow(mu2, 1.5) instead.

pv commented 11 years ago

Random seed not set in the tests, and the moment estimation fails in some cases?

josef-pkt commented 11 years ago

It might be because both don't have finite moments. I don't know what's going on, why this is calculated and which example. besides mixing numpy and python operations as bburlo points out.

I don't find any references on the moments for folded cauchy on the internet. cauchy itself doesn't have finite moments. From the code of alph_gen._stats the first two moments of alpha are inf and skew and kurtosis are nan

The tests should be skipped for these cases AFAICS and remember

josef-pkt commented 11 years ago

Note this is from the latest merge https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L454 In the old tests these cases where skipped, or higher moment tests were not run.

However, I think there is an underlying bug that .stats or more likely _moment_from_stats shouldn't do any calculations if _stats are either nan or inf.

foldcauchy also has _stats return inf, inf, nan, nan same as alpha

josef-pkt commented 11 years ago

One part I don't understand is that the exception doesn't always show up.There is no random in there. Maybe it depends on the numpy version

Actually @bburlo 's initial suggestion will most likely work. At several places in distributions, I switched from python operators to using numpy np.power, np.multiply, ... so we get correct nan and inf handling. As far as I remember, the nan and inf handling has also changed in python in recent versions.

proof python 2.6

>>> import scipy
>>> scipy.__version__
'0.9.0'
>>> np.__version__
'1.5.1'
>>> stats.foldcauchy.moment(3, 4.7164673455831894)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python26\lib\site-packages\scipy\stats\distributions.py", line 1575, in moment
    mu3 = g1*(mu2**1.5) # 3rd central moment
OverflowError: (34, 'Result too large')

python 3.3 newest releases of scipy and numpy

In [16]:
from scipy import stats
stats.foldcauchy.moment(3, 4.7164673455831894)

Out[16]:
nan

Question is still whether _moment_from_stats should short circuit in the case of nan and inf instead of relying on floating point propagation

And current test suite doesn't have tests for moment > 2 (bug since scipy 0.9)

josef-pkt commented 11 years ago

replacing ** by np.power all the relevant parts fixes the problem on scipy 0.9 (which actually predates _moment_from_stats and means this bug was there forever, just never tested)

ev-br commented 11 years ago

If it is nan/inf handling indeed, then a quick fix might be to cherry-pick https://github.com/bburlo/scipy/commit/91c4aa2043eb14bf466b26be837233bd70ce934d --- where the stats method is moved to rv_generic, so that np.power is used for both discrete and continuous distributions. (just copied the whole stats method from rv_continuous to rv_generic as is, without changing anything in it).

josef-pkt commented 11 years ago

I don't see how that would fix _moment_from_stats (besides there are still ** in that PR that might also mean some cases of stats for ks are broken.

I'd rather just change the ** here and review the refactoring pull request separately (at the cost of a small merge conflict)

Note: moments > 2, and skew kurtosis have still bugs (wrong calculations) and are not tested. The test for the args refactoring looks like a good smoke test.

rgommers commented 11 years ago

Fixed in gh-2760, thanks all.