Closed mckib2 closed 2 years ago
I've converted the "demos" from the stats-boost-deliverables repo to "tests". They may or may not stay in this form, it's mainly a way to keep track of which issues are resolved using boost. Run like this:
pytest scipy/stats/boost/tests/test_boost_stats.py
Cool, so this is building in SciPy now?
@mdhaber I built from clean just now. If you have a minute, could you pull and see if it builds for you?
Looks easy:
Error compiling Cython file:
------------------------------------------------------------
...
cdef extern from "C:\Users\matth\scipy\scipy\stats\boost\include\func_defs.hpp" namespace "" nogil:
^
------------------------------------------------------------
func_defs.pxd:1:20: Invalid unicode escape '\U'
Error compiling Cython file:
------------------------------------------------------------
...
cdef extern from "C:\Users\matth\scipy\scipy\stats\boost\include\func_defs.hpp" namespace "" nogil:
^
------------------------------------------------------------
func_defs.pxd:1:20: Invalid unicode escape '\U'
Error compiling Cython file:
------------------------------------------------------------
...
cdef extern from "C:\Users\matth\scipy\scipy\stats\boost\include\func_defs.hpp" namespace "" nogil:
^
------------------------------------------------------------
func_defs.pxd:1:20: Invalid unicode escape '\U'
Error compiling Cython file:
------------------------------------------------------------
...
cdef extern from "C:\Users\matth\scipy\scipy\stats\boost\include\func_defs.hpp" namespace "" nogil:
^
------------------------------------------------------------
func_defs.pxd:1:20: Invalid unicode escape '\U'
Error compiling Cython file:
------------------------------------------------------------
...
cdef extern from "C:\Users\matth\scipy\scipy\stats\boost\include\func_defs.hpp" namespace "" nogil:
^
------------------------------------------------------------
func_defs.pxd:1:20: Invalid unicode escape '\U'
Traceback (most recent call last):
File "C:\Users\matth\scipy\tools\cythonize.py", line 311, in <module>
main()
File "C:\Users\matth\scipy\tools\cythonize.py", line 307, in main
find_process_files(root_dir)
File "C:\Users\matth\scipy\tools\cythonize.py", line 296, in find_process_files
for result in pool.imap_unordered(lambda args: process(*args), jobs):
File "C:\Users\matth\Anaconda3\envs\scipydev\lib\multiprocessing\pool.py", line 748, in next
raise value
File "C:\Users\matth\Anaconda3\envs\scipydev\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "C:\Users\matth\scipy\tools\cythonize.py", line 296, in <lambda>
for result in pool.imap_unordered(lambda args: process(*args), jobs):
File "C:\Users\matth\scipy\tools\cythonize.py", line 230, in process
processor_function(fromfile, tofile, cwd=path)
File "C:\Users\matth\scipy\tools\cythonize.py", line 95, in process_pyx
raise Exception('Cython failed')
Exception: Cython failed
Traceback (most recent call last):
File "setup.py", line 588, in <module>
setup_package()
File "setup.py", line 572, in setup_package
generate_cython()
File "setup.py", line 334, in generate_cython
raise RuntimeError("Running cythonize failed!")
RuntimeError: Running cythonize failed!
Windows, my old nemesis... It does look like an easy fix. Will look at a little later or tomorrow, please feel free to patch before then if you can't wait
seems ok on mac. i'll work with that.
Built and the boost tests passed!
This is awesome. Can you see if these pass the appropriate tests in test_continuous_basic.py
, test_distributions.py
(and test_discrete_basic.py
/ test_discrete_distns.py
)?
Windows build should be working (fingers crossed). Progress on continuous distributions, haven't touched tests for discrete distributions yet.
test_continuous_basic.py
tests are looking alright, there's a couple of edge cases that need to be examined where I excluded the boost distributions fromtest_distributions.py
working except for some logpdf business with boost.ncx2. @mdhaber do you mind taking a look at the failing test here?Sorry but same error : /
Good news on times, though. Boost is faster than SciPy for several tests.
1.74s call scipy/stats/tests/test_continuous_basic.py::test_cont_basic[ncx2-arg75]
0.99s call scipy/stats/tests/test_continuous_basic.py::test_cont_basic[boost.ncx2-arg105]
1.37s call scipy/stats/tests/test_continuous_basic.py::test_cont_basic[boost.beta-arg104]
0.67s call scipy/stats/tests/test_continuous_basic.py::test_cont_basic[beta-arg4]
1.22s call scipy/stats/tests/test_continuous_basic.py::test_moments[ncx2-arg76-True-True-False]
0.39s call scipy/stats/tests/test_continuous_basic.py::test_moments[boost.ncx2-arg107-True-True-False]
0.51s call scipy/stats/tests/test_continuous_basic.py::test_moments[beta-arg4-True-True-False]
0.48s call scipy/stats/tests/test_continuous_basic.py::test_moments[boost.beta-arg106-True-True-False]
I'm asking @swallan to make it easier to add new distributions to the benchmarks so we can get a more accurate assessment.
@mdhaber Thanks for taking a look! I'm working towards conference submission deadline, but after this coming Wednesday I should have some time to resume work on this
Sorry but same error : /
I've tried converting to PosixPath
then calling str()
. Hopefully that fixes it -- I don't have scipy building on Windows right now, but if the issue persists I may try to set it up again
RE https://github.com/scipy/scipy/issues/7406: boost.binom.ppf
does the right thing at q=0, 0.5, and 1
Thanks. I'll try compiling again tonight. Can you patch the wrapper for Beta so it works at x=1?
Now working on Windows! W00t! Next I'll take a closer look at the "interesting" Python code, like tests. I'll also look out for other issues this might solve. @swallan even though your SciPy stats benchmarking PR isn't merged yet, can you:
@WarrenWeckesser do you have thoughts on the lower level things?
Can you patch the wrapper for Beta so it works at x=1?
Now patched to return inf when x=1, beta<1 (I think that's the right answer, happy to change if not) EDIT: returns 0, not inf
I have created a new branch called benchmark_dist_compare to work with discrete distributions and to show the times for the scipy and boost methods:
Thanks @swallan! I'll review that PR shortly but these results look great.
@mckib2 I think there's one more thing we want to do before we post this PR on SciPy's GitHub. I checked with @WarrenWeckesser, and ultimately, we're not going to want to have separate scipy.stats
and scipy.stats.boost
versions of these distributions; we'll be replacing SciPy's implementations with the underlying Boost code. So I think the thing to do is to "drop in" your replacements from beta.py
and binom.py
/nbinom.py
into _continuous_distns.py
and _discrete_distns.py
, respectively.
For beta
, we should preserve the code that is not upgraded by Boost, like _fitstart
and _fit
.
So I think the thing to do is to "drop in" your replacements from beta.py and binom.py/nbinom.py into _continuous_distns.py and _discrete_distns.py, respectively.
They should be "dropped in" now, all the tests are passing on my machine and I'm curious to know if that holds on other machines as well.
For beta, we should preserve the code that is not upgraded by Boost, like _fitstart and _fit.
I kept all the fitting code as well as logpdf implementations (see reviews). I'm not sure whether it's better to use the existing logpdf implementations or take the log of the boost pdf implementation -- I'll let the council decide
Thanks @mckib2. On my mac, there's one failure that appears to be "real"; the other three seem to be due to a bad merge (and easily fixed).
=================================== FAILURES ===================================
___________________________ TestBinom.test_warns_p0 ____________________________
scipy/stats/tests/test_distributions.py:233: in test_warns_p0
assert_equal(stats.binom(n=2, p=0).mean(), 0)
self = <scipy.stats.tests.test_distributions.TestBinom object at 0x11d0cb390>
scipy/stats/_distn_infrastructure.py:488: in mean
return self.dist.mean(*self.args, **self.kwds)
self = <scipy.stats._distn_infrastructure.rv_frozen object at 0x11d0e63d0>
scipy/stats/_distn_infrastructure.py:1321: in mean
res = self.stats(*args, **kwds)
args = ()
kwds = {'moments': 'm', 'n': 2, 'p': 0}
self = <scipy.stats._discrete_distns.binom_gen object at 0x11d0e6090>
scipy/stats/_distn_infrastructure.py:1121: in stats
mu, mu2, g1, g2 = self._stats(*goodargs)
args = (array(2), array(0))
cond = True
default = array(nan)
goodargs = [array([2]), array([0])]
kwds = {'moments': 'm', 'n': 2, 'p': 0}
loc = array([0])
moments = 'm'
output = []
scale = array([1])
self = <scipy.stats._discrete_distns.binom_gen object at 0x11d0e6090>
scipy/stats/_discrete_distns.py:80: in _stats
_boost._binom_skewness(n, p),
E RuntimeWarning: divide by zero encountered in _binom_skewness
n = array([2])
p = array([0])
self = <scipy.stats._discrete_distns.binom_gen object at 0x11d0e6090>
It's not building on Windows anymore:
building 'boost_test_build' library
compiling C sources
creating build\temp.win-amd64-3.7\scipy\_build_utils
creating build\temp.win-amd64-3.7\scipy\_build_utils\boostinator
creating build\temp.win-amd64-3.7\scipy\_build_utils\boostinator\boost_test_build
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\matth\scipy\scipy\_build_utils\boostinator -IC:\Users\matth\Anaconda3\envs\scipydev\lib\site-packages\numpy\core\include -IC:\Users\matth\Anaconda3\envs\scipydev\include -IC:\Users\matth\Anaconda3\envs\scipydev\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\include -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt /EHsc /Tpscipy\_build_utils\boostinator\boost_test_build\boost_test_build.cpp /Fobuild\temp.win-amd64-3.7\scipy\_build_utils\boostinator\boost_test_build\boost_test_build.obj
boost_test_build.cpp
scipy\_build_utils\boostinator\boost_test_build\boost_test_build.cpp(4): fatal error C1083: Cannot open include file: 'boost/math/distributions.hpp': No such file or directory
error: Command "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\matth\scipy\scipy\_build_utils\boostinator -IC:\Users\matth\Anaconda3\envs\scipydev\lib\site-packages\numpy\core\include -IC:\Users\matth\Anaconda3\envs\scipydev\include -IC:\Users\matth\Anaconda3\envs\scipydev\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\include -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt /EHsc /Tpscipy\_build_utils\boostinator\boost_test_build\boost_test_build.cpp /Fobuild\temp.win-amd64-3.7\scipy\_build_utils\boostinator\boost_test_build\boost_test_build.obj" failed with exit status 2
It's not building on Windows anymore
Well at least we know the boost build test works...
I'll check out these and other issues either today or this weekend.
EDIT: merge and test errors should be fixed, not sure what happened to the Windows build
@mdhaber I used this guide to build on Windows and I'm not getting the error you reported. Can you try building again from clean to see if it's a stale build file breaking something?
I does build when I clone from this repo. Not sure why removing the build folder and git clean -fdx isn't enough.
Update: I just tried git clean -fdx
again. Worked this time. Whatever.
@WarrenWeckesser The Python code looks good to me. There are no errors on my machine. I helped write the test code, and I just double-checked that this closes the issues that it claims to close, so I think I'm done. Did you want to look over the lower-level stuff before this PR is opened on the main repo?
Now patched to return inf when x=1, beta<1 (I think that's the right answer, happy to change if not) EDIT: returns 0, not inf
Are you sure about that? I think inf
is right for beta < 1
. In SciPy currently:
from scipy.stats import beta
beta(a = 2, b = 0.5).pdf(1)
Also see the plots on Wikipedia. And from the definition, (1- x)^(beta - 1) is 0 to a negative power, which is divide by zero.
Were any other fixes like this needed? Also, it doesn't look like it, but were there any modifications (besides the additions) to the tests?
Did you want to look over the lower-level stuff before this PR is opened on the main repo?
I'm making some of the cython generation code a little nicer, but other than that it should be good to go on my end. EDIT: done
Are you sure about that? I think inf is right for beta < 1
I can change back to inf, I thought the Boost-side upstream fix used 0 but I may be mistaken. EDIT: changed to INF
Were any other fixes like this needed?
Beta was the only distribution we needed to patch, everything else should be using "normal" wrappers.
Also, it doesn't look like it, but were there any modifications (besides the additions) to the tests?
We only added tests, we didn't modify any. I think there may be some modifications I made to make it easier to include boost as another module that are no longer necessary since we're replacing the distributions, so I can scan through and remove those. EDIT: done
@mdhaber, I'll get back to this tomorrow morning.
I can change back to inf, I thought the Boost-side upstream fix used 0 but I may be mistaken.
Well, you're right, it does appear to use 0. I will look closer.
It seems to be a matter of whether the support is (0, 1) or [0, 1]. Wikipedia shows both possibilities; Mathematica (referenced by the Boost dev) uses (0, 1). But SciPy says:
and the current implementation does use that formula at the endpoints, so I think we would need to return infinity for backwards compatibility, unless we consider this to be a mistake, in which case we'd also need to change the docs. I would actually prefer the latter, because then we wouldn't have to modify the behavior of the boost function.
Wikipedia says:
This definition includes both ends x = 0 and x = 1, which is consistent with definitions for other continuous distributions supported on a bounded interval which are special cases of the beta distribution, for example the arcsine distribution, and consistent with several authors, like N. L. Johnson and S. Kotz.[1][2][3][4] However, the inclusion of x = 0 and x = 1 does not work for α, β < 1; accordingly, several other authors, including W. Feller,[5][6][7] choose to exclude the ends x = 0 and x = 1, (so that the two ends are not actually part of the domain of the density function) and consider instead 0 < x < 1.
@WarrenWeckesser?
Also, seems to me that beta
should implement _get_support
, but it doesn't (even in SciPy master).
@mdhaber @WarrenWeckesser Is this ready to open in scipy?
I thought so here. I think we're waiting for the OK from @WarrenWeckesser.
Is this ready to open in scipy?
Yes, go ahead.
@mdhaber @WarrenWeckesser
Reference issue
#11026 (does not close)Beta Distribution:
12796 (does not close for
special
, but resolves the problem forstats
)Binomial Distribution / Negative Binomial Distribution
5122 (fixes the problem for 0 < x < 1, but at x = 1 it does the same thing as SciPy - and so does R).
5503 (Does this close it? Boost is not perfect for arbitrarily large numbers. We might consider going to a normal approximation when it starts to lose accuracy.)
2414 (Does not fix this. Boost does not seem to check for n being an integer, nor does it round.)
11465 (does not addresses the
nbinom
portion of this issue. The user is looking for much more precision.)adding more as we find them...
What does this implement/fix?
scipy._build_utils.boostinator
(adapted from https://github.com/mckib2/boostinator), includes Boost 1.74Additional information
Boost distributions can be used like this:
Function the same as their
scipy.stats
counterparts.