VonMisesMixtureBinned does not work with statsmodels from git

rc commented 11 years ago

Traceback (most recent call last):
  File "./fit_von_mises.py", line 128, in <module>
    main()
  File "./fit_von_mises.py", line 118, in main
    logs, alog = analyze(source, psets, options)
  File "./analyses/fit_mixture.py", line 110, in analyze
    pl.plot_histogram_comparison(pset.output_dir, res, source, ii)
  File "./analyses/plots.py", line 116, in plot_histogram_comparison
    ret_sizes=True)
  File "./aorta/dist_mixtures/mixture_von_mises.py", line 380, in rvs_mix
    size=sizes[ii]))
  File "~/software/usr/local/lib/python/dist-packages/scipy/stats/distributions.py", line 627, in rvs
    vals = self._rvs(*args)
  File "~/software/usr/local/lib/python/dist-packages/scipy/stats/distributions.py", line 5748, in _rvs
    return mtrand.vonmises(0.0, b, size=self._size)
  File "mtrand.pyx", line 2240, in mtrand.RandomState.vonmises (numpy/random/mtrand/mtrand.c:10433)
  File "mtrand.pyx", line 201, in mtrand.cont2_array_sc (numpy/random/mtrand/mtrand.c:1867)
ValueError: negative dimensions are not allowed

rc commented 11 years ago

The error is caused by stats.vonmises._cdf() returning nans for too large b (given by the first BFGS iteration):

x = [-3.14159265 -3.10668607 -3.07177948 ...,  3.07177948  3.10668607
  3.14159265]
b = 2143.11596994
loc = -1.7748961779
ipdb> stats.vonmises._cdf(x-loc, b)
array([ nan,  nan,  nan, ...,  nan,  nan,  nan])

rc commented 11 years ago

The max. value of b that does not give nan is 709:

ipdb> stats.vonmises._cdf(x-loc, 709)
array([  4.52609671e-248,   1.40117001e-237,   3.53853284e-227, ...,
         1.00000000e+000,   1.00000000e+000,   1.00000000e+000])
ipdb> stats.vonmises._cdf(x-loc, 710)
array([ nan,  nan,  nan, ...,  nan,  nan,  nan])

josef-pkt commented 11 years ago

can you check what the _size Is in the traceback? Sounds like integer overflow.

What scipy version are you using? Did you upgrade scipy in the last few months?

I don't remember any recent changes in statsmodels master that would affect the optimization for this.

rc commented 11 years ago

Yes, it looks like that. I am using scipy from git to have basinhopping() available. The traceback above is caused by the params of nans from an earlier call to VonMisesMixtureBinned.fit(). Then:

starting parameters: [ 2.  0.  3.  0.  0.]
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: nan
         Iterations: 1
         Function evaluations: 42
         Gradient evaluations: 42
~/software/usr/local/lib/python/dist-packages/statsmodels/base/model.py:343: Warning: Inverting hessian failed, no bse or cov_params available
  warn(warndoc, Warning)

Estimated distributions (2 components)
dist0: shape=2196596.7532, loc=1.4885, prob=   nan
dist1: shape=484521.1196, loc=2.1939, prob=0.0000
> ./aorta/dist_mixtures/mixture_von_mises.py(377)rvs_mix()
    376         rvs = []
--> 377         for ii in range(k_dist):
    378             try:

ipdb> sizes
array([-9223372036854735711,                    0])
ipdb> params
array([  2.19659675e+06,   1.48846026e+00,   4.84521120e+05,
         2.19386410e+00,   5.71582442e+05])

BTW. I have hacked VonMisesMixtureBinned.loglikeobs() (see current master) to have a workaround. It seems to work ok...

rc commented 11 years ago

More info: the basin-hopping solver rejects the step with nans automatically - the above hack is not needed with it. So I think it is not an issue of statsmodels, but of some scipy optimization solvers not detecting nans.

rc commented 11 years ago

... so let us close this.

rc / dist_mixtures

VonMisesMixtureBinned does not work with statsmodels from git #17