Closed AgilentGCMS closed 2 months ago
Thanks for the report @AgilentGCMS. That ARPACK code has had recurring issues with some BLAS functions (cdotc
& co.) that are sensitive to Fortran compiler ABI conventions. MKL is built with the opposite convention as the default one (gfortran).
Maybe this particular crash was always an issue with MKL from OneAPI, or maybe it changed with https://github.com/scipy/scipy/pull/20232 (that was included in 1.13.0, which you're testing here). Do you know if this is a regression from 1.12.x or not for your setup?
I can reproduce this on Windows with scipy-1.13.1 and OneAPI MKL 2024.1.
Thanks @cgohlke for confirming. So it's a problem that's not platform-dependent.
I'll add a bit of context, since it may not be worth starting to debug here - the maintenance state of our ARPACK copy is very bad:
It's also still possible that this is an MKL bug. We do test with Accelerate, which uses the same ABI conventions as MKL, and things work as expected there.
Seems like I am able to reproduce the segmentation fault with FC=ifx
and C/C++ compilers from conda. ifx
+ MKL
and ifx
+ OpenBLAS
combination fails with the same segmentation fault. However, gfortran
+ MKL
combination works just fine. Will try with ifort
tomorrow.
My conclusion is that ifx
usage gives problems, MKL
doesn't.
Commands
python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp
(sparse
test succeed)
FC=ifx python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp
(sparse
test fail with segmentation fault)
FC=ifx python dev.py build
(sparse
test fail with segmentation fault)
Logs
SciPy from development installed path at: /home/czgdp18079/Quansight/scipy/build-install/lib/python3.12/site-packages
Running tests for scipy version:1.15.0.dev0+1043.bbe993d, installed at:/home/czgdp18079/Quansight/scipy/build-install/lib/python3.12/site-packages/scipy
============================= test session starts ==============================
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/czgdp18079/Quansight/scipy
configfile: pytest.ini
plugins: anyio-4.4.0, xdist-3.6.1, cov-5.0.0, hypothesis-6.104.2, timeout-2.3.1
collected 12224 items / 348 deselected / 11876 selected
scipy/sparse/csgraph/tests/test_connected_components.py ........ [ 0%]
scipy/sparse/csgraph/tests/test_conversions.py ... [ 0%]
scipy/sparse/csgraph/tests/test_flow.py ................................ [ 0%]
........... [ 0%]
scipy/sparse/csgraph/tests/test_graph_laplacian.py ..................... [ 0%]
........................................................................ [ 1%]
........................................................................ [ 1%]
........................................................................ [ 2%]
........................................................................ [ 3%]
........................................................................ [ 3%]
........................................................................ [ 4%]
........................................................................ [ 4%]
........................................................................ [ 5%]
........................................................................ [ 6%]
........................................................................ [ 6%]
........................................................................ [ 7%]
........................................................................ [ 7%]
........................................................................ [ 8%]
........................................................................ [ 9%]
........................................................................ [ 9%]
......................... [ 9%]
scipy/sparse/csgraph/tests/test_matching.py ............................ [ 10%]
.... [ 10%]
scipy/sparse/csgraph/tests/test_pydata_sparse.py sssssssssssssssssssssss [ 10%]
sssssssssssssssssssssssssssssssssss [ 10%]
scipy/sparse/csgraph/tests/test_reordering.py ... [ 10%]
scipy/sparse/csgraph/tests/test_shortest_path.py ....................... [ 10%]
................................................. [ 11%]
scipy/sparse/csgraph/tests/test_spanning_tree.py . [ 11%]
scipy/sparse/csgraph/tests/test_traversal.py ........ [ 11%]
scipy/sparse/linalg/_dsolve/tests/test_linsolve.py .s.s.s.s.ss.......... [ 11%]
s............................ [ 11%]
scipy/sparse/linalg/_eigen/arpack/tests/test_arpack.py .Fatal Python error: Segmentation fault
Current thread 0x00007efc3801b740 (most recent call first):
...
OK I bumped the ARPACK version in #21110. Maybe it can solve this issue too. No idea what the changes are as it still has issues unresolved upstream too.
Simultaneously, I started rewriting ARPACK work. The code is exactly what anyone would expect from a F77 monolith.
With your PR, the segmentation fault disappears from _arpack
but comes in test_svds
,
Commands
python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp
(succeeds with sparse
module)
FC=ifx python dev.py build
(sparse
test fail with segmentation fault)
FC=ifx python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp
(sparse
test fail with segmentation fault)
FC="ifort -diag-disable=10448" python dev.py build
(sparse
test fail with segmentation fault)
FC="ifort -diag-disable=10448" python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp
(sparse
test fail with segmentation fault)
Logs
scipy/sparse/linalg/_eigen/tests/test_svds.py .......................... [ 14%]
................................................s.ss..s.ssss....s.ss..s. [ 15%]
ssss....s.ss..s.ssss....s.ss..s.ssss....s.ss..s.ssss....s.ss..s.ssss.... [ 16%]
............ssssssss.................................................... [ 16%]
............................................s.ss....s.ss....s.ss....s.ss [ 17%]
....s.ss....s.ss....s.ss....s.ss....s.ss....s.ss....s.ss....s.ss........ [ 17%]
........................................................................ [ 18%]
..............................s.Fatal Python error: Fatal Python error: Segmentation fault
Sigh... Language of scientific computing.
I think the problem is with ifx
and not MKL
/OpenBLAS
. With gfortran
, both MKL
and OpenBLAS
pass all the tests in sparse
module. With ifx
, both MKL
and OpenBLAS
give segmentation fault at the same point.
Could you run the test suite with -v
option so that we know which test failed?
Yes. Now I will debug the tests. I wanted to see what all combinations work and what all fail. I have started with ifx
and MKL
combination.
One weird observation with the following code (basically an MRE of the failing test),
import numpy as np
from scipy.sparse.linalg import svds
def test_svd_simple(A, k, real, transpose, lo_type):
A = np.asarray(A)
A = np.real(A) if real else A
A = A.T if transpose else A
A2 = lo_type(A)
u, s, vh = svds(A2, k, solver='propack', random_state=0)
print(u, s, vh)
A1 = [[1, 2, 3], [3, 4, 3], [1 + 1j, 0, 2], [0, 0, 1]]
test_svd_simple(A1, 1, False, False, np.asarray)
Randomised behaviour
Results produced are correct but seems like segmentation fault is random, sometimes it happens sometimes it doesn't. Seems like coming from LLVM generated code by ifx
.
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
[[0.39378471+0.32618658j]
[0.63317198+0.51091137j]
[0.15324317+0.21149959j]
[0.06920502+0.05994875j]] [7.06729714] [[0.37610466-0.27127444j 0.46980582-0.38147804j 0.48909242-0.42367566j]]
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
[[0.39378471+0.32618658j]
[0.63317198+0.51091137j]
[0.15324317+0.21149959j]
[0.06920502+0.05994875j]] [7.06729714] [[0.37610466-0.27127444j 0.46980582-0.38147804j 0.48909242-0.42367566j]]
Notes
With pdb
, no segmentation fault comes up.
Small update,
The problem is with the calling of wzdotc
from complex16/zblasext.F
in PROPACK
.
When I inline zdotc
from LAPACK (and rename it to pzdotc
- nothing but a wrapper to wzdotc
) in complex16/zblasext.F
, then segmentation fault goes away and all the tests in sparse
module pass. Probably it's some ABI incompatibility between ifx
and MKL
/OpenBLAS
.
See https://github.com/scipy/scipy/commit/3f61bc15474a1da27e7fe5e4be01a6322a380c86 and https://github.com/scipy/scipy/commit/ea11308ef3d6b0bb7939a6630fae77511c5688c5 in my branch of SciPy.
Now I merged the new version of ARPACK 3.9.1 which vendors its own zzdotc
instead. So that should potentially solve the problem then.
Sure. I will try it in the morning.
If replacing the wzdotc
wrapper by a vendored zzdotc
function does fix the problem, then it'd also be good to know whether that is because we're using the dummy version of the wrapper dependency when we shouldn't (MKL should require the non-dummy version), or the wrapper code isn't actually doing what it's supposed to do.
The segmentation fault goes away but the following error is showing up with zzdotc
, ifx
and MKL
combination. With gfortran
+ MKL
combination all tests pass.
----------------------------- Captured stdout call -----------------------------
WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence.
Try increasing KMAX.
neig = 0
=========================== short test summary info ============================
FAILED scipy/sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_PROPACK::test_svd_linop - numpy.linalg.LinAlgError: k=3 singular triplets did not converge within kma...
= 1 failed, 10204 passed, 1626 skipped, 348 deselected, 43 xfailed, 2 xpassed in 136.25s (0:02:16) =
Captured stdout is not important, that's expected. The PROPACK test can be adjusted too. Happy that you don't encounter any segfaults yet.
scipy/sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_PROPACK::test_svd_linop Fatal Python error: Segmentation fault
gives segmentation faults with increased kmax
.
After https://github.com/czgdp1807/PROPACK/commit/b59104e306f2164deca19f6a25f98c599d4c22e8 all the segmentation faults are fixed in python dev.py test -s sparse
.
With ifx
+ MKL combination, no segmentation faults with https://github.com/czgdp1807/PROPACK/commit/b59104e306f2164deca19f6a25f98c599d4c22e8 in the whole test suite of SciPy but the following tests fail,
FAILED scipy/interpolate/tests/test_fitpack2.py::TestUnivariateSpline::test_out_of_range_regression - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-<lambda>-kwds1-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-<lambda>-kwds2-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-Bounds-kwds1-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-Bounds-kwds2-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-<lambda>-kwds1-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-<lambda>-kwds2-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-Bounds-kwds1-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-Bounds-kwds2-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints8-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints8-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints8-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints8-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-<lambda>-kwds1-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-<lambda>-kwds2-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-Bounds-kwds1-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-Bounds-kwds2-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-<lambda>-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-<lambda>-kwds1-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-<lambda>-kwds2-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-Bounds-kwds0-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-Bounds-kwds1-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-Bounds-kwds2-SLSQP] - AssertionError:
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints8-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints8-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints8-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints8-Bounds-kwds2-SLSQP] - assert np.False_
= 81 failed, 50065 passed, 2775 skipped, 11739 deselected, 162 xfailed, 18 xpassed in 521.45s (0:08:41) =
ifx
+ MKL
combination on Linux
SLSQP {'fun': <function setup_test_equal_bounds.<locals>.func at 0x7f263d199a80>, 'jac': False} <function setup_test_equal_bounds.<locals>.<lambda> at 0x7f2630fc6520> (None, None) None
True (1289.4752551349443, np.float64(1490.4970902896914)) (array([ 1.41371438, 2. , 0.87977266, -1. ]), array([ 0.79393209, 2. , 0.74248698, -1. ]))
Traceback (most recent call last):
File "/home/czgdp18079/Quansight/scipy/../debug.py", line 188, in <module>
test_equal_bounds(method, kwds, bound_type, constraints, callback)
File "/home/czgdp18079/Quansight/scipy/../debug.py", line 147, in test_equal_bounds
assert_allclose(res.fun, expected.fun, rtol=1.5e-6)
File "/home/czgdp18079/miniforge3/envs/scipy-dev/lib/python3.12/site-packages/numpy/testing/_private/utils.py", line 1684, in assert_allclose
assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
File "/home/czgdp18079/miniforge3/envs/scipy-dev/lib/python3.12/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/home/czgdp18079/miniforge3/envs/scipy-dev/lib/python3.12/site-packages/numpy/testing/_private/utils.py", line 885, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1.5e-06, atol=0
Mismatched elements: 1 / 1 (100%)
Max absolute difference among violations: 201.02183515
Max relative difference among violations: 0.13486899
ACTUAL: array(1289.475255)
DESIRED: array(1490.49709)
On my Apple Macbook Pro
SLSQP {'fun': <function setup_test_equal_bounds.<locals>.func at 0x104bcbb00>, 'jac': False} <function setup_test_equal_bounds.<locals>.<lambda> at 0x13a607b00> (None, None) None
True (1289.475255134854, 1289.4752550852413) (array([ 1.41371438, 2. , 0.87977267, -1. ]), array([ 1.41371252, 2. , 0.87977062, -1. ]))
4 4
(array([nan, nan]), array([nan, nan]))
(1289.475255134854, 1289.475255134854)
(array([1.41371438, 0.87977267]), array([1.41371438, 0.87977267]))
Seems like the problem is with expected.x
. res.x
on linux and macOS are same. expected.x
are different. It seems like, jac=optimize.rosen_der
is causing problems in optimize.minimize
.
Actually, there is not very signficant problem with ifx
. Its just that the code in slsqp_optmz.F
is very rigid. Now I was debugging one example. There is this condition IF(wmax.LE.ZERO) GOTO 280
(inside nnls
subroutine) . Its intent is to make sure that all elements in w
are non-positive. However, ifx
compiled code results in wmax
to be 1.6e-21
. Not ZERO
but very close to ZERO . So that causes chaotic computation because the above condition fails leading to further iterations which causes corruption of values (like correct values inside an array becoming arbitrarily large). If I change the above condition to, IF(ABS(wmax).LE.1e-20) GOTO 280
then the ifx
also give same results with same example. :D.
Also, the branch I am working on is - https://github.com/czgdp1807/scipy/tree/ifx_bug_2
Do you have #21021 already on your branch which we recently merged? That should remove the need for the fortran compilation.
I merged main
into my branch. The tests still fail. I think slsqp
method still calls nnls
written in scipy/optimize/slsqp/slsqp_optmz.f
.
Ah yes my bad. Apologies i didn't push my slsqp to Scipy. I'm confusing myself finally reading too much fortran
In this, https://github.com/scipy/scipy/issues/20728#issuecomment-2223838292, I changed 1e-20
to 1e-19
in IF(ABS(wmax).LE.1e-20) GOTO 280
and test_equal_bounds
pass fully in optimize
tests.
Seems like ifx
has noisy behaviour with floating point computations. However, also floating point comparisons should always be done with some tolerances and not rigidly like, .EQ.
or directly with ZERO
.
With this https://github.com/scipy/scipy/pull/21173, things work with ifx
+ MKL
on my Linux machine. I will address reviews and run the tests on CI, next week.
However, also floating point comparisons should always be done with some tolerances and not rigidly like, .EQ. or directly with ZERO
This is not always true, it needs some context, and there are always some problems that come very close to zero. Old foxes are not that careless about these algorithms doing such simple errors.
But here I think it won't matter since we are going to translate it anyways.
ifx
and MKL
are probably too aggressive with FMA optimizations and we are seeing substantial deviations.
Old foxes are not that careless about these algorithms doing such simple errors.
True. May be as you said, ifx
is doing FMA optimisations which is leading to these issues. I will try with ifort
+ MKL
combination. ifort
doesn't do optimisations by default, right? Is there any way we can silence optimisations with ifx
?
I checked https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-ifort-to-ifx.html. Seems like ifx
optimisations can be disabled via, -O0
and -fpmodel=precise
. Worth trying I think.
I must disappoint you that I really don't know much about fortran compilers. But indeed if you have capacity to try it out, that would be great.
Note that the floating-point precision flags for Intel compilers should already be set correctly, by this code: https://github.com/scipy/scipy/blob/8ccaf44b3644b9e3ad33afea7c6ef03780053fab/meson.build#L101-L129
Please verify that that's the case (the strict
flag should appear in build/build.ninja
). If so, the problem is a different one.
Actually, FC="ifx -O0"
and FC="ifx -O0 -fpmodel=precise"
doesn't make a difference. test_equal_bounds
still fail because of the issues I mentioned in https://github.com/scipy/scipy/issues/20728#issuecomment-2225878433. So, I think the problem is with code generated by ifx
. Also note that scipy.optimize
doesn't use anything from MKL as far as I can tell (let me know if I am missing anything).
Update - I am starting my work on adding Linux CI job in https://github.com/scipy/scipy/pull/21173.
@czgdp1807, the wmax
is computed from the w
values, which are computed by a dot-product. Dieter Kraft was using a manually unrolled version called ddot_sl
.
Could you try replacing it with a different version?
C STEP TWO (COMPUTE DUAL VARIABLES)
C .....ENTRY LOOP A
C ( ... omitted ... )
DO 120 iz=iz1,iz2
j=INDEX(iz)
120 w(j)=ddot_sl(m-nsetp,a(npp1,j),1,b(npp1),1)
C STEP THREE (TEST DUAL VARIABLES)
130 wmax=ZERO
DO 140 iz=iz1,iz2
j=INDEX(iz)
IF(w(j).LE.wmax) GOTO 140
wmax=w(j)
izmax=iz
140 CONTINUE
C .....EXIT LOOP A
IF(wmax.LE.ZERO) GOTO 280
Here is a test program with a few ways it can be done:
C test.f
C compile with ifx -O1 -fp-model=precise test.f ddot_sl.f -qmkl
program test
implicit none
integer, parameter :: n = 100000000
double precision a(n), b(n)
C Version in SLSQP (adapted by Dieter Kraft)
external ddot_sl
double precision ddot_sl
C BLAS (MKL) version
external ddot
double precision ddot
call random_number(a)
call random_number(b)
print *, ddot_sl(n,a,1,b,1)
print *, ddot(n,a,1,b,1)
print *, dot_product(a,b)
print *, sum(a*b)
end
Sure @ivan-pi. Let me do it. I will update this comment if it works. gfortran
works fine with the unrolled version. May be, ifx
can give different results if I use yours. Let me see.
The problem is not with loop unrolling in ddot_sl
. In fact, LAPACK also uses exactly same implementation for ddot
. See, https://netlib.org/lapack/explore-html/d1/dcc/group__dot_ga2a42ecc597403b22ad786715c739196b.html. The problem is that ifx
is not precise enough with DOUBLE PRECISION
. I used REAL*16
inside ddot_sl
(see the diff of slsqp_optimize.f
in #21173) and I was able to remove IF(ABS(wmax).LE.1e-20) GOTO 280
. The only thing left is IF(rnorm.LT.ZERO) GOTO 50
in line 1094
. Its safe and no tests fail with this (both ifx
and gfortran
work).
Why don't we just mark this SLSQP test as xfail
and move on? The code will be rewritten in C or C++ anyway, so then the problem is solved.
You can determine in the test suite whether Intel Fortran was used to build from:
scipy.show_config(mode='dicts')['Compilers']['fortran']['name']
We now have a OneAPI CI job (all Intel compilers + MKL), so this should hopefully not regress again.
Describe your issue.
I am installing scipy from source on an HPC on top of python 3.11.6. The code builds without errors, but
scipy.test()
segfaults atscipy/sparse/linalg/_eigen/arpack/tests/test_arpack.py
. We have intel MKL on the HPC, so I built scipy withReproducing Code Example
Error message
SciPy/NumPy/Python version and system information