scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
12.95k stars 5.16k forks source link

BUG: sparse.linalg: Segfault in `arpack` with `ifx` #20728

Closed AgilentGCMS closed 2 months ago

AgilentGCMS commented 4 months ago

Describe your issue.

I am installing scipy from source on an HPC on top of python 3.11.6. The code builds without errors, but scipy.test() segfaults at scipy/sparse/linalg/_eigen/arpack/tests/test_arpack.py. We have intel MKL on the HPC, so I built scipy with

CC=icx CXX=icpx FC=ifx python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp

Reproducing Code Example

python dev.py test -s sparse

Error message

ninja: Entering directory `/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build'
[4/4] Generating scipy/generate-version with a custom command
Build OK
💻  meson install -C build --only-changed
Installing, see meson-install.log...
Installation OK
SciPy from development installed path at: /work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build-install/lib/python3.11/site-packages
Running tests for scipy version:1.13.0, installed at:/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build-install/lib/python3.11/site-packages/scipy
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.11.6, pytest-7.4.3, pluggy-1.3.0
rootdir: /work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0
configfile: pytest.ini
plugins: hypothesis-6.102.4
collected 11638 items / 61 deselected / 11577 selected

scipy/sparse/csgraph/tests/test_connected_components.py ........                                                                                                             [  0%]
scipy/sparse/csgraph/tests/test_conversions.py ...                                                                                                                           [  0%]
scipy/sparse/csgraph/tests/test_flow.py ...........................................                                                                                          [  0%]
scipy/sparse/csgraph/tests/test_graph_laplacian.py ......................................................................................................................... [  1%]
............................................................................................................................................................................ [  2%]
............................................................................................................................................................................ [  4%]
............................................................................................................................................................................ [  5%]
............................................................................................................................................................................ [  7%]
............................................................................................................................................................................ [  8%]
.................................................................................................................................................                            [ 10%]
scipy/sparse/csgraph/tests/test_matching.py ................................                                                                                                 [ 10%]
scipy/sparse/csgraph/tests/test_pydata_sparse.py ssssssssssssssssssssssssssssssssss                                                                                          [ 10%]
scipy/sparse/csgraph/tests/test_reordering.py ...                                                                                                                            [ 10%]
scipy/sparse/csgraph/tests/test_shortest_path.py ...................................                                                                                         [ 11%]
scipy/sparse/csgraph/tests/test_spanning_tree.py .                                                                                                                           [ 11%]
scipy/sparse/csgraph/tests/test_traversal.py ........                                                                                                                        [ 11%]
scipy/sparse/linalg/_dsolve/tests/test_linsolve.py .s.s.s.s.ss..........s.....................                                                                               [ 11%]
scipy/sparse/linalg/_eigen/arpack/tests/test_arpack.py .Fatal Python error: Segmentation fault

Current thread 0x0000146b7118b740 (most recent call first):
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build-install/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py", line 883 in extract
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build-install/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py", line 1357 in eigs
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build-install/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py", line 1577 in eigsh
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build-install/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/arpack/tests/test_arpack.py", line 248 in eval_evec
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build-install/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/arpack/tests/test_arpack.py", line 425 in test_hermitian_modes
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/python.py", line 1792 in runtest
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/runner.py", line 169 in pytest_runtest_call
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/runner.py", line 262 in <lambda>
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/runner.py", line 261 in call_runtest_hook
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/runner.py", line 222 in call_and_report
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/runner.py", line 133 in runtestprotocol
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/runner.py", line 114 in pytest_runtest_protocol
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/main.py", line 350 in pytest_runtestloop
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/main.py", line 325 in _main
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/main.py", line 271 in wrap_session
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/main.py", line 318 in pytest_cmdline_main
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_callers.py", line 77 in _multicall
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_manager.py", line 115 in _hookexec
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pluggy/_hooks.py", line 493 in __call__
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/_pytest/config/__init__.py", line 169 in main
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build-install/lib/python3.11/site-packages/scipy/_lib/_testutils.py", line 106 in __call__
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/dev.py", line 745 in scipy_tests
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/dev.py", line 761 in run
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/doit/action.py", line 461 in execute
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/doit/task.py", line 473 in execute
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/doit/runner.py", line 176 in execute_task
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/doit/runner.py", line 220 in run_tasks
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/doit/runner.py", line 254 in run_all
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/doit/cmd_run.py", line 265 in _execute
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/doit/cmd_base.py", line 570 in execute
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/doit/cmd_base.py", line 150 in parse_execute
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/doit/api.py", line 50 in run_tasks
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/dev.py", line 251 in run_doit_task
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/pydevtool/cli.py", line 267 in callback
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/click/core.py", line 783 in invoke
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/click/core.py", line 1434 in invoke
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/click/core.py", line 1688 in invoke
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/rich_click/rich_command.py", line 126 in main
  File "/work2/noaa/co2/sbasu/packages/python/3.11.6/lib/python3.11/site-packages/click/core.py", line 1157 in __call__
  File "/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/dev.py", line 1486 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy._lib._ccallback_c, scipy._lib._fpumode, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, scipy.special._ufuncs_cxx, scipy.special._cdflib, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2 (total: 50)
Segmentation fault (core dumped)

SciPy/NumPy/Python version and system information

Python 3.11.6 (main, May 15 2024, 11:36:54) [GCC Intel(R) C++ gcc 11.3.1 mode] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info); scipy.show_config()
1.13.0 1.26.4 sys.version_info(major=3, minor=11, micro=6, releaselevel='final', serial=0)
/work2/noaa/co2/sbasu/packages/sources/scipy-1.13.0/build-install/lib/python3.11/site-packages/scipy/__config__.py:154: UserWarning: Install `pyyaml` for better output
  warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
  "Compilers": {
    "c": {
      "name": "intel-llvm",
      "linker": "ld.bfd",
      "version": "11.3.1",
      "commands": "icx"
    },
    "cython": {
      "name": "cython",
      "linker": "cython",
      "version": "3.0.10",
      "commands": "cython"
    },
    "c++": {
      "name": "intel-llvm",
      "linker": "ld.bfd",
      "version": "11.3.1",
      "commands": "icpx"
    },
    "fortran": {
      "name": "intel-llvm",
      "linker": "ld.bfd",
      "version": "2023.1.0",
      "commands": "ifx"
    },
    "pythran": {
      "version": "0.14.0",
      "include directory": "../../../python/3.11.6/lib/python3.11/site-packages/pythran"
    }
  },
  "Machine Information": {
    "host": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "linux"
    },
    "build": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "linux"
    },
    "cross-compiled": true
  },
  "Build Dependencies": {
    "blas": {
      "name": "mkl-dynamic-lp64-iomp",
      "found": true,
      "version": "2023.1",
      "detection method": "pkgconfig",
      "include directory": "/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2023.1.0-4cujjco7etbwl34hwrtw3ree7dwhxnci/mkl/latest/lib/pkgconfig/../../include",
      "lib directory": "/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2023.1.0-4cujjco7etbwl34hwrtw3ree7dwhxnci/mkl/latest/lib/pkgconfig/../../lib/intel64",
      "openblas configuration": "unknown",
      "pc file directory": "/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2023.1.0-4cujjco7etbwl34hwrtw3ree7dwhxnci/mkl/2023.1.0/lib/pkgconfig"
    },
    "lapack": {
      "name": "mkl-dynamic-lp64-iomp",
      "found": true,
      "version": "2023.1",
      "detection method": "pkgconfig",
      "include directory": "/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2023.1.0-4cujjco7etbwl34hwrtw3ree7dwhxnci/mkl/latest/lib/pkgconfig/../../include",
      "lib directory": "/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2023.1.0-4cujjco7etbwl34hwrtw3ree7dwhxnci/mkl/latest/lib/pkgconfig/../../lib/intel64",
      "openblas configuration": "unknown",
      "pc file directory": "/apps/spack-managed/gcc-11.3.1/intel-oneapi-mkl-2023.1.0-4cujjco7etbwl34hwrtw3ree7dwhxnci/mkl/2023.1.0/lib/pkgconfig"
    },
    "pybind11": {
      "name": "pybind11",
      "version": "2.12.0",
      "detection method": "config-tool",
      "include directory": "unknown"
    }
  },
  "Python Information": {
    "path": "/work2/noaa/co2/sbasu/packages/python/3.11.6/bin/python3.11",
    "version": "3.11"
  }
}
rgommers commented 4 months ago

Thanks for the report @AgilentGCMS. That ARPACK code has had recurring issues with some BLAS functions (cdotc & co.) that are sensitive to Fortran compiler ABI conventions. MKL is built with the opposite convention as the default one (gfortran).

Maybe this particular crash was always an issue with MKL from OneAPI, or maybe it changed with https://github.com/scipy/scipy/pull/20232 (that was included in 1.13.0, which you're testing here). Do you know if this is a regression from 1.12.x or not for your setup?

cgohlke commented 4 months ago

I can reproduce this on Windows with scipy-1.13.1 and OneAPI MKL 2024.1.

rgommers commented 4 months ago

Thanks @cgohlke for confirming. So it's a problem that's not platform-dependent.

I'll add a bit of context, since it may not be worth starting to debug here - the maintenance state of our ARPACK copy is very bad:

It's also still possible that this is an MKL bug. We do test with Accelerate, which uses the same ABI conventions as MKL, and things work as expected there.

czgdp1807 commented 3 months ago

Seems like I am able to reproduce the segmentation fault with FC=ifx and C/C++ compilers from conda. ifx + MKL and ifx + OpenBLAS combination fails with the same segmentation fault. However, gfortran + MKL combination works just fine. Will try with ifort tomorrow.

My conclusion is that ifx usage gives problems, MKL doesn't.

Commands

python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp (sparse test succeed)

FC=ifx python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp (sparse test fail with segmentation fault)

FC=ifx python dev.py build (sparse test fail with segmentation fault)

Logs

SciPy from development installed path at: /home/czgdp18079/Quansight/scipy/build-install/lib/python3.12/site-packages
Running tests for scipy version:1.15.0.dev0+1043.bbe993d, installed at:/home/czgdp18079/Quansight/scipy/build-install/lib/python3.12/site-packages/scipy
============================= test session starts ==============================
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/czgdp18079/Quansight/scipy
configfile: pytest.ini
plugins: anyio-4.4.0, xdist-3.6.1, cov-5.0.0, hypothesis-6.104.2, timeout-2.3.1
collected 12224 items / 348 deselected / 11876 selected                        

scipy/sparse/csgraph/tests/test_connected_components.py ........         [  0%]
scipy/sparse/csgraph/tests/test_conversions.py ...                       [  0%]
scipy/sparse/csgraph/tests/test_flow.py ................................ [  0%]
...........                                                              [  0%]
scipy/sparse/csgraph/tests/test_graph_laplacian.py ..................... [  0%]
........................................................................ [  1%]
........................................................................ [  1%]
........................................................................ [  2%]
........................................................................ [  3%]
........................................................................ [  3%]
........................................................................ [  4%]
........................................................................ [  4%]
........................................................................ [  5%]
........................................................................ [  6%]
........................................................................ [  6%]
........................................................................ [  7%]
........................................................................ [  7%]
........................................................................ [  8%]
........................................................................ [  9%]
........................................................................ [  9%]
.........................                                                [  9%]
scipy/sparse/csgraph/tests/test_matching.py ............................ [ 10%]
....                                                                     [ 10%]
scipy/sparse/csgraph/tests/test_pydata_sparse.py sssssssssssssssssssssss [ 10%]
sssssssssssssssssssssssssssssssssss                                      [ 10%]
scipy/sparse/csgraph/tests/test_reordering.py ...                        [ 10%]
scipy/sparse/csgraph/tests/test_shortest_path.py ....................... [ 10%]
.................................................                        [ 11%]
scipy/sparse/csgraph/tests/test_spanning_tree.py .                       [ 11%]
scipy/sparse/csgraph/tests/test_traversal.py ........                    [ 11%]
scipy/sparse/linalg/_dsolve/tests/test_linsolve.py .s.s.s.s.ss.......... [ 11%]
s............................                                            [ 11%]
scipy/sparse/linalg/_eigen/arpack/tests/test_arpack.py .Fatal Python error: Segmentation fault

Current thread 0x00007efc3801b740 (most recent call first):
  ...
ilayn commented 3 months ago

OK I bumped the ARPACK version in #21110. Maybe it can solve this issue too. No idea what the changes are as it still has issues unresolved upstream too.

Simultaneously, I started rewriting ARPACK work. The code is exactly what anyone would expect from a F77 monolith.

czgdp1807 commented 3 months ago

With your PR, the segmentation fault disappears from _arpack but comes in test_svds,

Commands

python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp (succeeds with sparse module)

FC=ifx python dev.py build (sparse test fail with segmentation fault)

FC=ifx python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp (sparse test fail with segmentation fault)

FC="ifort -diag-disable=10448" python dev.py build (sparse test fail with segmentation fault)

FC="ifort -diag-disable=10448" python dev.py build -C-Dblas=mkl-dynamic-lp64-iomp -C-Dlapack=mkl-dynamic-lp64-iomp (sparse test fail with segmentation fault)

Logs

scipy/sparse/linalg/_eigen/tests/test_svds.py .......................... [ 14%]
................................................s.ss..s.ssss....s.ss..s. [ 15%]
ssss....s.ss..s.ssss....s.ss..s.ssss....s.ss..s.ssss....s.ss..s.ssss.... [ 16%]
............ssssssss.................................................... [ 16%]
............................................s.ss....s.ss....s.ss....s.ss [ 17%]
....s.ss....s.ss....s.ss....s.ss....s.ss....s.ss....s.ss....s.ss........ [ 17%]
........................................................................ [ 18%]
..............................s.Fatal Python error: Fatal Python error: Segmentation fault
ilayn commented 3 months ago

Sigh... Language of scientific computing.

czgdp1807 commented 3 months ago

I think the problem is with ifx and not MKL/OpenBLAS. With gfortran, both MKL and OpenBLAS pass all the tests in sparse module. With ifx, both MKL and OpenBLAS give segmentation fault at the same point.

ilayn commented 3 months ago

Could you run the test suite with -v option so that we know which test failed?

czgdp1807 commented 3 months ago

Yes. Now I will debug the tests. I wanted to see what all combinations work and what all fail. I have started with ifx and MKL combination.

One weird observation with the following code (basically an MRE of the failing test),

import numpy as np
from scipy.sparse.linalg import svds

def test_svd_simple(A, k, real, transpose, lo_type):
    A = np.asarray(A)
    A = np.real(A) if real else A
    A = A.T if transpose else A
    A2 = lo_type(A)

    u, s, vh = svds(A2, k, solver='propack', random_state=0)
    print(u, s, vh)

A1 = [[1, 2, 3], [3, 4, 3], [1 + 1j, 0, 2], [0, 0, 1]]
test_svd_simple(A1, 1, False, False, np.asarray)

Randomised behaviour

Results produced are correct but seems like segmentation fault is random, sometimes it happens sometimes it doesn't. Seems like coming from LLVM generated code by ifx.

(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
[[0.39378471+0.32618658j]
 [0.63317198+0.51091137j]
 [0.15324317+0.21149959j]
 [0.06920502+0.05994875j]] [7.06729714] [[0.37610466-0.27127444j 0.46980582-0.38147804j 0.48909242-0.42367566j]]
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
Segmentation fault (core dumped)
(scipy-dev) czgdp18079@scipy-linux-9:~/Quansight/scipy$ python ../debug.py
[[0.39378471+0.32618658j]
 [0.63317198+0.51091137j]
 [0.15324317+0.21149959j]
 [0.06920502+0.05994875j]] [7.06729714] [[0.37610466-0.27127444j 0.46980582-0.38147804j 0.48909242-0.42367566j]]

Notes

With pdb, no segmentation fault comes up.

czgdp1807 commented 3 months ago

Small update,

The problem is with the calling of wzdotc from complex16/zblasext.F in PROPACK.

When I inline zdotc from LAPACK (and rename it to pzdotc - nothing but a wrapper to wzdotc) in complex16/zblasext.F, then segmentation fault goes away and all the tests in sparse module pass. Probably it's some ABI incompatibility between ifx and MKL/OpenBLAS.

See https://github.com/scipy/scipy/commit/3f61bc15474a1da27e7fe5e4be01a6322a380c86 and https://github.com/scipy/scipy/commit/ea11308ef3d6b0bb7939a6630fae77511c5688c5 in my branch of SciPy.

ilayn commented 3 months ago

Now I merged the new version of ARPACK 3.9.1 which vendors its own zzdotc instead. So that should potentially solve the problem then.

czgdp1807 commented 3 months ago

Sure. I will try it in the morning.

rgommers commented 3 months ago

If replacing the wzdotc wrapper by a vendored zzdotc function does fix the problem, then it'd also be good to know whether that is because we're using the dummy version of the wrapper dependency when we shouldn't (MKL should require the non-dummy version), or the wrapper code isn't actually doing what it's supposed to do.

czgdp1807 commented 3 months ago

The segmentation fault goes away but the following error is showing up with zzdotc, ifx and MKL combination. With gfortran + MKL combination all tests pass.

----------------------------- Captured stdout call -----------------------------
 WARNING: Maximum dimension of Krylov subspace exceeded prior to convergence.
  Try increasing KMAX.
 neig =            0
=========================== short test summary info ============================
FAILED scipy/sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_PROPACK::test_svd_linop - numpy.linalg.LinAlgError: k=3 singular triplets did not converge within kma...
= 1 failed, 10204 passed, 1626 skipped, 348 deselected, 43 xfailed, 2 xpassed in 136.25s (0:02:16) =
ilayn commented 3 months ago

Captured stdout is not important, that's expected. The PROPACK test can be adjusted too. Happy that you don't encounter any segfaults yet.

czgdp1807 commented 3 months ago

scipy/sparse/linalg/_eigen/tests/test_svds.py::Test_SVDS_PROPACK::test_svd_linop Fatal Python error: Segmentation fault gives segmentation faults with increased kmax.

czgdp1807 commented 2 months ago

After https://github.com/czgdp1807/PROPACK/commit/b59104e306f2164deca19f6a25f98c599d4c22e8 all the segmentation faults are fixed in python dev.py test -s sparse.

czgdp1807 commented 2 months ago

With ifx + MKL combination, no segmentation faults with https://github.com/czgdp1807/PROPACK/commit/b59104e306f2164deca19f6a25f98c599d4c22e8 in the whole test suite of SciPy but the following tests fail,

FAILED scipy/interpolate/tests/test_fitpack2.py::TestUnivariateSpline::test_out_of_range_regression - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints0-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints1-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-<lambda>-kwds1-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-<lambda>-kwds2-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-Bounds-kwds1-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints2-Bounds-kwds2-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints3-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints4-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-<lambda>-kwds1-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-<lambda>-kwds2-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-Bounds-kwds1-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints5-Bounds-kwds2-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints8-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints8-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints8-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[None-constraints8-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints0-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints1-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-<lambda>-kwds1-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-<lambda>-kwds2-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-Bounds-kwds1-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints2-Bounds-kwds2-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints3-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints4-Bounds-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-<lambda>-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-<lambda>-kwds1-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-<lambda>-kwds2-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-Bounds-kwds0-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-Bounds-kwds1-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints5-Bounds-kwds2-SLSQP] - AssertionError: 
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints8-<lambda>-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints8-<lambda>-kwds2-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints8-Bounds-kwds1-SLSQP] - assert np.False_
FAILED scipy/optimize/tests/test_optimize.py::test_equal_bounds[callback-constraints8-Bounds-kwds2-SLSQP] - assert np.False_
= 81 failed, 50065 passed, 2775 skipped, 11739 deselected, 162 xfailed, 18 xpassed in 521.45s (0:08:41) =
czgdp1807 commented 2 months ago

ifx + MKL combination on Linux

SLSQP {'fun': <function setup_test_equal_bounds.<locals>.func at 0x7f263d199a80>, 'jac': False} <function setup_test_equal_bounds.<locals>.<lambda> at 0x7f2630fc6520> (None, None) None

True (1289.4752551349443, np.float64(1490.4970902896914)) (array([ 1.41371438,  2.        ,  0.87977266, -1.        ]), array([ 0.79393209,  2.        ,  0.74248698, -1.        ]))
Traceback (most recent call last):
  File "/home/czgdp18079/Quansight/scipy/../debug.py", line 188, in <module>
    test_equal_bounds(method, kwds, bound_type, constraints, callback)
  File "/home/czgdp18079/Quansight/scipy/../debug.py", line 147, in test_equal_bounds
    assert_allclose(res.fun, expected.fun, rtol=1.5e-6)
  File "/home/czgdp18079/miniforge3/envs/scipy-dev/lib/python3.12/site-packages/numpy/testing/_private/utils.py", line 1684, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/czgdp18079/miniforge3/envs/scipy-dev/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/czgdp18079/miniforge3/envs/scipy-dev/lib/python3.12/site-packages/numpy/testing/_private/utils.py", line 885, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1.5e-06, atol=0

Mismatched elements: 1 / 1 (100%)
Max absolute difference among violations: 201.02183515
Max relative difference among violations: 0.13486899
 ACTUAL: array(1289.475255)
 DESIRED: array(1490.49709)

On my Apple Macbook Pro

SLSQP {'fun': <function setup_test_equal_bounds.<locals>.func at 0x104bcbb00>, 'jac': False} <function setup_test_equal_bounds.<locals>.<lambda> at 0x13a607b00> (None, None) None

True (1289.475255134854, 1289.4752550852413) (array([ 1.41371438,  2.        ,  0.87977267, -1.        ]), array([ 1.41371252,  2.        ,  0.87977062, -1.        ]))
4 4
(array([nan, nan]), array([nan, nan]))
(1289.475255134854, 1289.475255134854)
(array([1.41371438, 0.87977267]), array([1.41371438, 0.87977267]))
Python Code ```python import itertools import platform import numpy as np from numpy.testing import (assert_allclose, assert_equal, assert_almost_equal, assert_no_warnings, assert_warns, assert_array_less, suppress_warnings) import pytest from pytest import raises as assert_raises from scipy import optimize from scipy.optimize._minimize import Bounds, NonlinearConstraint from scipy.optimize._minimize import (MINIMIZE_METHODS, MINIMIZE_METHODS_NEW_CB, MINIMIZE_SCALAR_METHODS) from scipy.optimize._linprog import LINPROG_METHODS from scipy.optimize._root import ROOT_METHODS from scipy.optimize._root_scalar import ROOT_SCALAR_METHODS from scipy.optimize._qap import QUADRATIC_ASSIGNMENT_METHODS from scipy.optimize._differentiable_functions import ScalarFunction, FD_METHODS from scipy.optimize._optimize import MemoizeJac, show_options, OptimizeResult from scipy.optimize import rosen, rosen_der, rosen_hess from scipy.sparse import (coo_matrix, csc_matrix, csr_matrix, coo_array, csr_array, csc_array) def setup_test_equal_bounds(): np.random.seed(0) x0 = np.random.rand(4) lb = np.array([0, 2, -1, -1.0]) ub = np.array([3, 2, 2, -1.0]) i_eb = (lb == ub) def check_x(x, check_size=True, check_values=True): if check_size: assert x.size == 4 if check_values: assert_allclose(x[i_eb], lb[i_eb]) def func(x): check_x(x) return optimize.rosen(x) def grad(x): check_x(x) return optimize.rosen_der(x) def callback(x, *args): check_x(x) def constraint1(x): check_x(x, check_values=False) return x[0:1] - 1 def jacobian1(x): check_x(x, check_values=False) dc = np.zeros_like(x) dc[0] = 1 return dc def constraint2(x): check_x(x, check_values=False) return x[2:3] - 0.5 def jacobian2(x): check_x(x, check_values=False) dc = np.zeros_like(x) dc[2] = 1 return dc c1a = NonlinearConstraint(constraint1, -np.inf, 0) c1b = NonlinearConstraint(constraint1, -np.inf, 0, jacobian1) c2a = NonlinearConstraint(constraint2, -np.inf, 0) c2b = NonlinearConstraint(constraint2, -np.inf, 0, jacobian2) # test using the three methods that accept bounds, use derivatives, and # have some trouble when bounds fix variables methods = ('L-BFGS-B', 'SLSQP', 'TNC') # test w/out gradient, w/ gradient, and w/ combined objective/gradient kwds = ({"fun": func, "jac": False}, {"fun": func, "jac": grad}, {"fun": (lambda x: (func(x), grad(x))), "jac": True}) # test with both old- and new-style bounds bound_types = (lambda lb, ub: list(zip(lb, ub)), Bounds) # Test for many combinations of constraints w/ and w/out jacobian # Pairs in format: (test constraints, reference constraints) # (always use analytical jacobian in reference) constraints = ((None, None), ([], []), (c1a, c1b), (c2b, c2b), ([c1b], [c1b]), ([c2a], [c2b]), ([c1a, c2a], [c1b, c2b]), ([c1a, c2b], [c1b, c2b]), ([c1b, c2b], [c1b, c2b])) # test with and without callback function callbacks = (None, callback) data = {"methods": methods, "kwds": kwds, "bound_types": bound_types, "constraints": constraints, "callbacks": callbacks, "lb": lb, "ub": ub, "x0": x0, "i_eb": i_eb} return data eb_data = setup_test_equal_bounds() def test_equal_bounds(method, kwds, bound_type, constraints, callback): """ Tests that minimizers still work if (bounds.lb == bounds.ub).any() gh12502 - Divide by zero in Jacobian numerical differentiation when equality bounds constraints are used """ # GH-15051; slightly more skips than necessary; hopefully fixed by GH-14882 if (platform.machine() == 'aarch64' and method == "TNC" and kwds["jac"] is False and callback is not None): return 'Tolerance violation on aarch' lb, ub = eb_data["lb"], eb_data["ub"] x0, i_eb = eb_data["x0"], eb_data["i_eb"] test_constraints, reference_constraints = constraints if test_constraints and not method == 'SLSQP': return 'Only SLSQP supports nonlinear constraints' # reference constraints always have analytical jacobian # if test constraints are not the same, we'll need finite differences fd_needed = (test_constraints != reference_constraints) bounds = bound_type(lb, ub) # old- or new-style kwds.update({"x0": x0, "method": method, "bounds": bounds, "constraints": test_constraints, "callback": callback}) res = optimize.minimize(**kwds) expected = optimize.minimize(optimize.rosen, x0, method=method, jac=optimize.rosen_der, bounds=bounds, constraints=reference_constraints) # compare the output of a solution with FD vs that of an analytic grad print(res.success, (res.fun, expected.fun), (res.x, expected.x)) assert res.success assert_allclose(res.fun, expected.fun, rtol=1.5e-6) assert_allclose(res.x, expected.x, rtol=5e-4) if fd_needed or kwds['jac'] is False: expected.jac[i_eb] = np.nan print(res.jac.shape[0], 4) print((res.jac[i_eb], expected.jac[i_eb])) assert res.jac.shape[0] == 4 assert_allclose(res.jac[i_eb], expected.jac[i_eb], rtol=1e-6) if not (kwds['jac'] or test_constraints or isinstance(bounds, Bounds)): # compare the output to an equivalent FD minimization that doesn't # need factorization def fun(x): new_x = np.array([np.nan, 2, np.nan, -1]) new_x[[0, 2]] = x return optimize.rosen(new_x) fd_res = optimize.minimize(fun, x0[[0, 2]], method=method, bounds=bounds[::2]) print((res.fun, fd_res.fun)) print((res.x[[0, 2]], fd_res.x)) assert_allclose(res.fun, fd_res.fun) # TODO this test should really be equivalent to factorized version # above, down to res.nfev. However, testing found that when TNC is # called with or without a callback the output is different. The two # should be the same! This indicates that the TNC callback may be # mutating something when it shouldn't. assert_allclose(res.x[[0, 2]], fd_res.x, rtol=2e-6) i = 0 for method in eb_data["methods"]: for kwds in eb_data["kwds"]: for bound_type in eb_data["bound_types"]: for constraints in eb_data["constraints"]: for callback in eb_data["callbacks"]: if i == 108: print(method, kwds, bound_type, constraints, callback) print() test_equal_bounds(method, kwds, bound_type, constraints, callback) i = i + 1 ```

Seems like the problem is with expected.x. res.x on linux and macOS are same. expected.x are different. It seems like, jac=optimize.rosen_der is causing problems in optimize.minimize.

czgdp1807 commented 2 months ago

Actually, there is not very signficant problem with ifx . Its just that the code in slsqp_optmz.F is very rigid. Now I was debugging one example. There is this condition IF(wmax.LE.ZERO) GOTO 280 (inside nnls subroutine) . Its intent is to make sure that all elements in w are non-positive. However, ifx compiled code results in wmax to be 1.6e-21. Not ZERO but very close to ZERO . So that causes chaotic computation because the above condition fails leading to further iterations which causes corruption of values (like correct values inside an array becoming arbitrarily large). If I change the above condition to, IF(ABS(wmax).LE.1e-20) GOTO 280 then the ifx also give same results with same example. :D.

Also, the branch I am working on is - https://github.com/czgdp1807/scipy/tree/ifx_bug_2

ilayn commented 2 months ago

Do you have #21021 already on your branch which we recently merged? That should remove the need for the fortran compilation.

czgdp1807 commented 2 months ago

I merged main into my branch. The tests still fail. I think slsqp method still calls nnls written in scipy/optimize/slsqp/slsqp_optmz.f.

ilayn commented 2 months ago

Ah yes my bad. Apologies i didn't push my slsqp to Scipy. I'm confusing myself finally reading too much fortran

czgdp1807 commented 2 months ago

In this, https://github.com/scipy/scipy/issues/20728#issuecomment-2223838292, I changed 1e-20 to 1e-19 in IF(ABS(wmax).LE.1e-20) GOTO 280 and test_equal_bounds pass fully in optimize tests.

Seems like ifx has noisy behaviour with floating point computations. However, also floating point comparisons should always be done with some tolerances and not rigidly like, .EQ. or directly with ZERO.

czgdp1807 commented 2 months ago

With this https://github.com/scipy/scipy/pull/21173, things work with ifx + MKL on my Linux machine. I will address reviews and run the tests on CI, next week.

ilayn commented 2 months ago

However, also floating point comparisons should always be done with some tolerances and not rigidly like, .EQ. or directly with ZERO

This is not always true, it needs some context, and there are always some problems that come very close to zero. Old foxes are not that careless about these algorithms doing such simple errors.

But here I think it won't matter since we are going to translate it anyways.

ifx and MKL are probably too aggressive with FMA optimizations and we are seeing substantial deviations.

czgdp1807 commented 2 months ago

Old foxes are not that careless about these algorithms doing such simple errors.

True. May be as you said, ifx is doing FMA optimisations which is leading to these issues. I will try with ifort + MKL combination. ifort doesn't do optimisations by default, right? Is there any way we can silence optimisations with ifx?

czgdp1807 commented 2 months ago

I checked https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-ifort-to-ifx.html. Seems like ifx optimisations can be disabled via, -O0 and -fpmodel=precise. Worth trying I think.

ilayn commented 2 months ago

I must disappoint you that I really don't know much about fortran compilers. But indeed if you have capacity to try it out, that would be great.

rgommers commented 2 months ago

Note that the floating-point precision flags for Intel compilers should already be set correctly, by this code: https://github.com/scipy/scipy/blob/8ccaf44b3644b9e3ad33afea7c6ef03780053fab/meson.build#L101-L129

Please verify that that's the case (the strict flag should appear in build/build.ninja). If so, the problem is a different one.

czgdp1807 commented 2 months ago

Actually, FC="ifx -O0" and FC="ifx -O0 -fpmodel=precise" doesn't make a difference. test_equal_bounds still fail because of the issues I mentioned in https://github.com/scipy/scipy/issues/20728#issuecomment-2225878433. So, I think the problem is with code generated by ifx. Also note that scipy.optimize doesn't use anything from MKL as far as I can tell (let me know if I am missing anything).

Update - I am starting my work on adding Linux CI job in https://github.com/scipy/scipy/pull/21173.

ivan-pi commented 2 months ago

@czgdp1807, the wmax is computed from the w values, which are computed by a dot-product. Dieter Kraft was using a manually unrolled version called ddot_sl.

Could you try replacing it with a different version?

C STEP TWO (COMPUTE DUAL VARIABLES)
C .....ENTRY LOOP A

C ( ... omitted ... )

      DO 120 iz=iz1,iz2
         j=INDEX(iz)
  120    w(j)=ddot_sl(m-nsetp,a(npp1,j),1,b(npp1),1)

C STEP THREE (TEST DUAL VARIABLES)

  130 wmax=ZERO
      DO 140 iz=iz1,iz2
      j=INDEX(iz)
         IF(w(j).LE.wmax)             GOTO 140
         wmax=w(j)
         izmax=iz
  140 CONTINUE

C .....EXIT LOOP A

      IF(wmax.LE.ZERO)                GOTO 280

Here is a test program with a few ways it can be done:

C test.f
C compile with ifx -O1 -fp-model=precise test.f ddot_sl.f -qmkl
      program test
      implicit none
      integer, parameter :: n = 100000000
      double precision a(n), b(n)

C Version in SLSQP (adapted by Dieter Kraft)
      external ddot_sl
      double precision ddot_sl

C BLAS (MKL) version 
      external ddot 
      double precision ddot

      call random_number(a)
      call random_number(b)

      print *, ddot_sl(n,a,1,b,1)
      print *, ddot(n,a,1,b,1)
      print *, dot_product(a,b)
      print *, sum(a*b)

      end
czgdp1807 commented 2 months ago

Sure @ivan-pi. Let me do it. I will update this comment if it works. gfortran works fine with the unrolled version. May be, ifx can give different results if I use yours. Let me see.

czgdp1807 commented 2 months ago

The problem is not with loop unrolling in ddot_sl. In fact, LAPACK also uses exactly same implementation for ddot. See, https://netlib.org/lapack/explore-html/d1/dcc/group__dot_ga2a42ecc597403b22ad786715c739196b.html. The problem is that ifx is not precise enough with DOUBLE PRECISION. I used REAL*16 inside ddot_sl (see the diff of slsqp_optimize.f in #21173) and I was able to remove IF(ABS(wmax).LE.1e-20) GOTO 280. The only thing left is IF(rnorm.LT.ZERO) GOTO 50 in line 1094. Its safe and no tests fail with this (both ifx and gfortran work).

rgommers commented 2 months ago

Why don't we just mark this SLSQP test as xfail and move on? The code will be rewritten in C or C++ anyway, so then the problem is solved.

You can determine in the test suite whether Intel Fortran was used to build from:

scipy.show_config(mode='dicts')['Compilers']['fortran']['name']
rgommers commented 2 months ago

We now have a OneAPI CI job (all Intel compilers + MKL), so this should hopefully not regress again.