scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
12.92k stars 5.15k forks source link

Find out which blas/lapack library is used #9466

Closed timokau closed 5 years ago

timokau commented 5 years ago

I'm getting a segmentation fault when computing the schur decomposition of a 0-dimensional matrix. That is supposed to fail, but it should not crash. I'm only getting this error if I import rpy2 (python interface to R) before I import scipy. I do not get this issue if I use LD_PRELOAD to preload openblas. So importing (more precisely initializing) R must change the blas/lapack library used. I don't know which other library is used or where it may come from (R is compiled with openblas as well). I'm getting this error on ArchLinux as well as NixOS.

Is there a way to determine which library is used at runtime?

Reproducing code example:

python2 -c 'from rpy2 import rinterface as ri; ri.initr(); import scipy.linalg; scipy.linalg.schur(scipy.empty(shape=[0,0]))'

The error does not occur without the rpy2 import or if scipy is imported first. The error also occurs if we instead import robjects form rpy2 (which then implicitly does the initialization).

Error message:

 ** On entry to DGEES parameter number  6 had an illegal value

Then python exists with SIGSEGV.

Scipy/Numpy/Python version information:

('1.1.0', '1.15.4', sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0))
ilayn commented 5 years ago

This should be fixed in the 1.2 which will be released very soon. I receive

error: ((lwork==-1)||(lwork >= MAX(1,2*n))) failed for 3rd keyword lwork: dgees:lwork=0

EDIT Ah nevermind I missed the import order part

timokau commented 5 years ago

Yes, that is the error I get when preloading openblas.

ilayn commented 5 years ago

If that is happening it's a matter of using a different scipy version. Because this error comes (or not) depending on how scipy is installed.

Can you also add print scipy.__version__ somewhere when invoking with R. Probably R is picking up a different version of scipy from somewhere.

timokau commented 5 years ago

I'm not sure I understand what you mean. The error appears to come from the blas library segfaulting. Without changing anything else, doing export LD_PRELOAD=/path/to/libopenblas.so "fixes" the issue.

The version doesn't change after initializing rpy:

$ python2 -c 'from rpy2 import rinterface as ri; ri.initr(); import scipy.linalg; print scipy.__version__; scipy.linalg.schur(scipy.empty(shape=[0,0]))'
1.1.0
/nix/store/5wc1ck1rbsxrrq841ji9d0q2q34l5lvv-python-2.7.15-env/lib/python2.7/site-packages/rpy2/rinterface/__init__.py:195: RRuntimeWarning: Error: BLAS/LAPACK routine 'DGEES ' gave error code -6

  warnings.warn(x, RRuntimeWarning)
/nix/store/5wc1ck1rbsxrrq841ji9d0q2q34l5lvv-python-2.7.15-env/lib/python2.7/site-packages/rpy2/rinterface/__init__.py:195: RRuntimeWarning: Fatal error: unable to initialize the JIT

  warnings.warn(x, RRuntimeWarning)
[1]    10167 segmentation fault  result/bin/python2 -c 

Or with LD_PRELOAD set:

$ python2 -c 'from rpy2 import rinterface as ri; ri.initr(); import scipy.linalg; print scipy.__version__; scipy.linalg.schur(scipy.empty(shape=[0,0]))'
1.1.0
 ** On entry to DGEES PMGLRH parameter number  6 had an illegal value
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/nix/store/5wc1ck1rbsxrrq841ji9d0q2q34l5lvv-python-2.7.15-env/lib/python2.7/site-packages/scipy/linalg/decomp_schur.py", line 162, in schur
    sort_t=sort_t)
_flapack.error: ((lwork==-1)||(lwork >= MAX(1,2*n))) failed for 3rd keyword lwork: dgees:lwork=0                            
ilayn commented 5 years ago

The lapack library always segfaults on invalid input. What we do is to place some checkers such that invalid input never makes it into the low level subroutines. The error you get with LD_PRELOAD comes from our trap. https://github.com/scipy/scipy/blob/master/scipy/linalg/flapack_gen.pyf.src#L1226

If the straightforward scipy usage leads to an error that means the check was successful and the input is caught. But when you use a different setup and invalid input passes through, suggests that you have another type of installation somewhere on your system which doesn't have this check (or being removed) and trips.

timokau commented 5 years ago

That sounds like a bad idea for the lapack library.

Since the error is reported as RRuntimeWarning, maybe rpy2 has their own trap that somehow overrides scipys? How does the trapping work? And why would the LD_PRELOAD fix it in this case?

ilayn commented 5 years ago

Looks like Rpy2 is fiddling with the libraries to enable JIT. We don't have anything to do with this.

timokau commented 5 years ago

Alright I'll report it there. Thanks for clearing it up so quick.

timokau commented 5 years ago

For reference: https://bitbucket.org/rpy2/rpy2/issues/491/importing-rpy2-before-scipy-leads-to

ilayn commented 5 years ago

No problem. Thanks for taking the time to report it. Let us know if we can help further with it.

lgautier commented 5 years ago

Looks like Rpy2 is fiddling with the libraries to enable JIT. We don't have anything to do with this.

rpy2 is doing enough weird on its own.

If anything unorthodox is happening here, it is either R... or scipy.

rgommers commented 5 years ago

@timokau what happens when you import scipy.linalg first and rpy2 after that?

timokau commented 5 years ago

@rgommers then I don't get a segfault but the proper error message:

Traceback (most recent call last):
  File "<string>", line 1, in <module>  File "/usr/lib/python2.7/site-packages/scipy/linalg/decomp_schur.py", line 139, in schur              
    result = gees(lambda x: None, a1, lwork=-1)
ValueError: On entry to DGEES parameter number 6 had an illegal value                                   
timokau commented 5 years ago

I've logged the library usage here.

rgommers commented 5 years ago

@rgommers then I don't get a segfault but the proper error message:

That's at least a workaround then. It does seem to suggest R is modifying something globally.

timokau commented 5 years ago

Yes that and preloading openblas both work as workarounds. It would be great to find the root problem of course.