rstudio / reticulate

R Interface to Python
https://rstudio.github.io/reticulate
Apache License 2.0
1.65k stars 327 forks source link

BLAS compatiblity issue #922

Open sumny opened 3 years ago

sumny commented 3 years ago

I recently stumbled across a BLAS compatiblity issue. Although this issue may only occur seldomly (depending on your system's and numpy's BLAS version), I wanted to report/discuss it nevertheless.

First some information about my setup:

5.10.7-arch1-1

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libopenblasp-r0.3.13.so
LAPACK: /usr/lib/liblapack.so.3.9.0

reticulate_1.18
library(reticulate)
py_config()
python:         /home/lps/.local/share/r-miniconda/envs/r-reticulate/bin/python
libpython:      /home/lps/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
pythonhome:     /home/lps/.local/share/r-miniconda/envs/r-reticulate:/home/lps/.local/share/r-miniconda/envs/r-reticulate
version:        3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54)  [GCC 7.3.0]
numpy:          /home/lps/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
numpy_version:  1.19.4

Now, suppose I want to invert a matrix in R using numpy via reticulate:

py_run_string("import numpy as np")
py_run_string("from numpy.linalg import inv")
py_run_string("a = np.array([[1., 2., -3., -1.], [3., -4., 5., 1.], [1., 2., 3., -10.], [15., 2., -3., -1.]])")
py_run_string("inv(a)")

This gives me a segfault:

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: py_run_string_impl(code, local, convert)
 2: py_run_string("inv(a)")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

Debugging via R -d gdb --vanilla yields:

py_run_string("inv(a)")
Thread 1 "R" received signal SIGSEGV, Segmentation fault.
0x00007ffff6e299a4 in drot_k_SANDYBRIDGE () from /usr/lib/libblas.so.3
(gdb) where
#0  0x00007ffff6e299a4 in drot_k_SANDYBRIDGE () from /usr/lib/libblas.so.3
#1  0x00007fffcbc84e4d in dgetf2_k ()
   from /home/lps/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy/core/../../numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so
#2  0x00007fffcbc80b31 in dgetrf_parallel ()
   from /home/lps/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy/core/../../numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so
#3  0x00007fffcba746a9 in dgesv_ ()
   from /home/lps/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy/core/../../numpy.libs/libopenblasp-r0-ae94cfde.3.9.dev.so
#4  0x00007fffca56f313 in DOUBLE_inv ()
   from /home/lps/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy/linalg/_umath_linalg.cpython-36m-x86_64-linux-gnu.so
#5  0x00007fffcd9add6b in PyUFunc_GenericFunction_int ()
   from /home/lps/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so
#6  0x00007fffcd9ae276 in ufunc_generic_call ()
   from /home/lps/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so
.
.
.

Note that /usr/lib/libblas.so.3 is a symlink to

readlink -f /usr/lib/libblas.so.3/
usr/lib/libopenblasp-r0.3.13.so

Notably drot_k_SANDYBRIDGE () actually should not be called here. I believe this is a result of numpy being built against a different openblas version, namely libopenblasp-r0-ae94cfde.3.9.dev.so whereas R is built against the newer libopenblasp-r0.3.13.so. Therefore, I believe some table lookup or something related fails, causing the segfault.

In python (/home/lps/.local/share/r-miniconda/envs/r-reticulate/bin/python) everything works as expected:

import numpy as np
print(np.__file__)
/home/lps/.local/share/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy/__init__.py
from numpy.linalg import inv
a = np.array([[1., 2., -3., -1.], [3., -4., 5., 1.], [1., 2., 3., -10.], [15., 2., -3., -1.]])
inv(a)
array([[-0.07142857,  0.        ,  0.        ,  0.07142857],
       [-2.12380952, -1.1       ,  0.06666667,  0.35714286],
       [-1.48095238, -0.6       ,  0.06666667,  0.21428571],
       [-0.87619048, -0.4       , -0.06666667,  0.14285714]])

If I downgrade my systems openblas version to e.g., 0.3.10 or 0.3.9 the matrix inversion now also works in R using numpy via reticulate.

I am not sure how a good solution to this problem could look like. I guess you cannot expect different BLAS versions to be fully compatible with each other.

kevinushey commented 3 years ago

Thanks for the bug report and detailed investigation. My guess is that the right fix here is to ensure that R and Python are both using the same BLAS library; presumedly if using Python from Miniconda standalone then there's a bundled version of OpenBLAS that is used instead.

cboettig commented 2 years ago

@kevinushey would it be possible for reticulate to compare the BLAS libraries used in numpy vs R and warn or error on this?

I know this isn't really a reticulate issue per se, numpy should be throwing a helpful error instead of a segfault when linked against another version of BLAS. But reticulate users will feel the issue most keenly since it's particularly difficult or impossible for them to debug this and be able to re-install numpy from source (e.g. I'm not sure if that's possible with conda-based envs?)

meanwhile if anyone stumbles on this, this can be resolved in a pip-based virtualenv by installing numpy from source, e.g. with

reticulate::py_install("numpy", pip_options="--no-binary='numpy'", ignore_installed=TRUE)

as @jwalton3141 notes above.

kevinushey commented 2 years ago

Good question... we could probably introspect the BLAS library that numpy was configured to use when it is loaded, e.g.

> np$show_config()
blas_info:
    libraries = ['cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['/Users/kevin/Library/r-miniconda-arm64/envs/r-reticulate/lib']
    include_dirs = ['/Users/kevin/Library/r-miniconda-arm64/envs/r-reticulate/include']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
    libraries = ['cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['/Users/kevin/Library/r-miniconda-arm64/envs/r-reticulate/lib']
    include_dirs = ['/Users/kevin/Library/r-miniconda-arm64/envs/r-reticulate/include']
    language = c
lapack_info:
    libraries = ['lapack', 'blas', 'lapack', 'blas']
    library_dirs = ['/Users/kevin/Library/r-miniconda-arm64/envs/r-reticulate/lib']
    language = f77
lapack_opt_info:
    libraries = ['lapack', 'blas', 'lapack', 'blas', 'cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['/Users/kevin/Library/r-miniconda-arm64/envs/r-reticulate/lib']
    language = c
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
    include_dirs = ['/Users/kevin/Library/r-miniconda-arm64/envs/r-reticulate/include']
Supported SIMD extensions in this NumPy install:
    baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD
    found = ASIMDHP
    not found = ASIMDDP

And then cross-check that against the BLAS libraries actually loaded in the process:

> system(paste("lsof -p", Sys.getpid(), "| grep blas"))
rsession- 45809 kevin  txt       REG               1,18    193440            56767533 /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
rsession- 45809 kevin  txt       REG               1,18  10570608             6977756 /Users/kevin/Library/r-miniconda-arm64/envs/r-reticulate/lib/libopenblas_vortexp-r0.3.18.dylib
cboettig commented 2 years ago

Might be fixed https://github.com/numpy/numpy/pull/21717