mpimd-csc / flexiblas

FlexiBLAS - A BLAS and LAPACK wrapper library with runtime exchangeable backends. This is only a mirror of https://gitlab.mpi-magdeburg.mpg.de/software/flexiblas-release
https://www.mpi-magdeburg.mpg.de/projects/flexiblas
GNU Lesser General Public License v3.0
36 stars 7 forks source link

segmentation fault with numpy on POWER9 (only) when using FlexiBLAS #17

Open boegel opened 3 years ago

boegel commented 3 years ago

I'm seeing a Segmentation fault when running the numpy 1.20.3 tests when using FlexiBLAS 3.0.4 with OpenBLAS 0.3.15, but not when linking to OpenBLAS 0.3.15 directly, which tells me FlexiBLAS is somehow causing the segmentation fault...

I'm not seeing this problem on Intel (Haswell, Skylake X), AMD (Rome), or Arm (AWS Graviton2).

Here's a partial backtrace I obtained when running the numpy tests via gdb:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff4887530 in dnrm2_k () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
Missing separate debuginfos, use: yum debuginfo-install libxcrypt-4.1.1-4.el8.ppc64le
(gdb) bt
#0  0x00007ffff4887530 in dnrm2_k () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
#1  0x00007ffff453d788 in dnrm2_ () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
#2  0x00007ffff62cfd9c in dnrm2_ () from /home/centos/EasyBuild/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib64/libflexiblas.so.3
#3  0x00007ffff4d7816c in dgeev_ () from /home/centos/EasyBuild/software/OpenBLAS/0.3.15-GCC-10.3.0/lib/../lib64/libopenblas.so.0
#4  0x00007ffff639e8e4 in dgeev_ () from /home/centos/EasyBuild/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib64/libflexiblas.so.3
#5  0x00007fff7364b334 in call_dgeev (params=0x7ffffffe63b0) at numpy/linalg/umath_linalg.c.src:2292
#6  DOUBLE_eig_wrapper (JOBVL=JOBVL@entry=78 'N', JOBVR=JOBVR@entry=86 'V', args=0x7fff50dad120, dimensions=<optimized out>, steps=<optimized out>) at numpy/linalg/umath_linalg.c.src:2292
#7  0x00007fff7364c02c in DOUBLE_eig (args=<optimized out>, dimensions=<optimized out>, steps=<optimized out>, __NPY_UNUSED_TAGGEDfunc=<optimized out>) at numpy/linalg/umath_linalg.c.src:2336
#8  0x00007ffff6a5d294 in PyUFunc_GeneralizedFunction (op=0x7ffffffe8200, kwds=0x0, args=0x7fff50dad0f0, ufunc=0x0) at numpy/core/src/umath/ufunc_object.c:2986
#9  PyUFunc_GenericFunction_int (ufunc=<optimized out>, ufunc@entry=0x7fff736c1130, args=args@entry=0x7fff50f88820, kwds=kwds@entry=0x7fff50e79c00, op=op@entry=0x7ffffffe8200)
    at numpy/core/src/umath/ufunc_object.c:3119
#10 0x00007ffff6a5f740 in ufunc_generic_call (ufunc=0x7fff736c1130, args=0x7fff50f88820, kwds=0x7fff50e79c00) at numpy/core/src/umath/ufunc_object.c:4747
...

This only happens when numpy is linked with FlexiBLAS:

$ ldd $(python -c "import numpy; print(numpy.core._multiarray_umath.__file__)") | grep blas
    libflexiblas.so.3 => /home/centos/EasyBuild/software/FlexiBLAS/3.0.4-GCC-10.3.0/lib64/libflexiblas.so.3 (0x0000200000570000)

Any ideas on what may be causing this segmentation fault?

I tried using ulimit -s unlimited (default is 8192 on that system), no change.

After export FLEXIBLAS=netlib to make FlexiBLAS use the fallback netlib backend, the segmentation fault doesn't happen either...

grisuthedragon commented 3 years ago

Can you provide the backtrace with debug information? How does it look like in valgrind?

boegel commented 3 years ago

Backtrace with debug info:

#0  dnrm2_k (n=2, x=<optimized out>, inc_x=1) at ../kernel/power/../arm/nrm2.c:69
#1  0x00007ffff453d788 in dnrm2_ (N=<optimized out>, x=<optimized out>, INCX=<optimized out>) at nrm2.c:61
#2  0x00007ffff62cf9fc in dnrm2_ (n=<optimized out>, x=<optimized out>, incx=<optimized out>) at /tmp/centos/FlexiBLAS/3.0.4/GCC-10.3.0/flexiblas-3.0.4/src/wrapper_blas_gnu.c:2899
#3  0x00007ffff4d788ec in dgeev (jobvl=..., jobvr=..., n=2, a=..., lda=<optimized out>, wr=..., wi=..., vl=..., ldvl=2, vr=..., ldvr=2, work=..., lwork=260, info=<optimized out>, _jobvl=140737323525740, _jobvr=8) at dgeev.f:490
#4  0x00007ffff639e594 in dgeev_ (jobvl=0x7ffffffe655c "NV", jobvr=0x7ffffffe655d "V", n=0x7ffffffe6548, a=0x7fff650fc3a0, lda=0x7ffffffe654c, wr=0x7fff650fc3c0, wi=0x7fff650fc3d0, vl=0x7fff650fc3e0, ldvl=0x7ffffffe6550, vr=0x7fff650fc3e0, ldvr=0x7ffffffe6554,
    work=0x7fff650458d0, lwork=0x7ffffffe6558, info=0x7ffffffe6560) at /tmp/centos/FlexiBLAS/3.0.4/GCC-10.3.0/flexiblas-3.0.4/src/lapack_interface/wrapper/dgeev.c:80
#5  0x00007fff7364b334 in call_dgeev (params=0x7ffffffe6500) at numpy/linalg/umath_linalg.c.src:2292
#6  DOUBLE_eig_wrapper (JOBVL=JOBVL@entry=78 'N', JOBVR=JOBVR@entry=86 'V', args=0x7fff5142d4a0, dimensions=<optimized out>, steps=<optimized out>) at numpy/linalg/umath_linalg.c.src:2292
#7  0x00007fff7364c02c in DOUBLE_eig (args=<optimized out>, dimensions=<optimized out>, steps=<optimized out>, __NPY_UNUSED_TAGGEDfunc=<optimized out>) at numpy/linalg/umath_linalg.c.src:2336
#8  0x00007ffff6a5d294 in PyUFunc_GeneralizedFunction (op=0x7ffffffe8270, kwds=0x0, args=0x7fff5142d470, ufunc=0x0) at numpy/core/src/umath/ufunc_object.c:2986
#9  PyUFunc_GenericFunction_int (ufunc=<optimized out>, ufunc@entry=0x7fff736c1130, args=args@entry=0x7fff5005aca0, kwds=kwds@entry=0x7fff50e7a700, op=op@entry=0x7ffffffe8270) at numpy/core/src/umath/ufunc_object.c:3119
#10 0x00007ffff6a5f740 in ufunc_generic_call (ufunc=0x7fff736c1130, args=0x7fff5005aca0, kwds=0x7fff50e7a700) at numpy/core/src/umath/ufunc_object.c:4747
...

I'll look into valgrind too.

boegel commented 3 years ago

@grisuthedragon No segmentation fault when running via Valgrind it seems (though a bunch of unrelated "Invalid read of size 4" cases in Python itself are reported). So that's a dead end I think, I'm afraid...

grisuthedragon commented 3 years ago

That's weird. I try to compile FB + Numpy on my power system asap.

boegel commented 3 years ago

To quickly trigger the segfault, you can use python -c "import numpy as np; np.linalg.test()".

Flamefire commented 3 years ago

I tried this too on a real ppc machine and the minimal reproducer for "issues" I got is python -c "import numpy as np; np.linalg.test(verbose=3, extra_argv=['-k', 'TestEigvals and test_sq_cases'])" which either segfaults with a double free or fails the test (works with OpenBLAS directly)

I also see messages in stderr:

 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DORGHR parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DORGHR parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DGEHRD parameter number  8 had an illegal value
 ** On entry to DORGHR parameter number  8 had an illegal value
 ** On entry to ZGEHRD parameter number  5 had an illegal value
 ** On entry to ZHSEQR parameter number  7 had an illegal value

Those are from the numpy xerblas error handler and I guess those are a good hint on to the real problem

Flamefire commented 3 years ago

More minimal reproducer: python -c "from numpy import array, linalg; linalg.eigvals(array([[1., 2.], [3., 4.]]))"

I suspect a stackoverflow due to GCC misoptimizing OpenBLAS which becomes apparent by FlexiBLAS as FlexiBLAS uses a the stack to save a register which gets overwritten by the bug. I reported this as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

grisuthedragon commented 3 years ago

@Flamefire Thanks for the work and identifying, where this behaviour comes from. Lets wait until the gcc guys react and see how they see this problem.

Flamefire commented 3 years ago

The IBM compiler guys are looking into this. It seems to be indeed a compiler issue since GCC 7. So I'd say this can be closed as there is nothing short of providing a better error message that can be done here

boegel commented 2 years ago

@Flamefire Any updates on this?

boegel commented 1 year ago

Small update here from our side: we've side-stepped this problem by compiling OpenBLAS with -fstack-protector-strong on POWER, see https://github.com/easybuilders/easybuild-easyconfigs/pull/15885 for more information

Flamefire commented 1 year ago

The GCC developers determined this a bug in the usage related to the Fortran calling convention:

As described in (https://gcc.gnu.org/onlinedocs/gfortran/Argument-passing-conventions.html), since the first parameter to DGEBAL is of type CHARACTER, there is an extra hidden argument. Change the call to DGEBAL from dgebal (the flexiBLAS wrapper routine) to take an extra argument. This causes the compiler to allocate a parameter save area in dgebal's frame, as there are now 9 parameters but only 8 parameter registers.

grisuthedragon commented 1 year ago

@Flamefire I know about this extra arguments, but due to compatibility reasons in the early times of FlexiBLAS, we neglected them. Even using CBLAS/LAPACKE from the reference implementation can lead to this issue, since they "forget" about these additional parameters as well.

For FlexiBLAS I will do some tests and, if successful, integrate it in the next release.