pace-neutrons / Euphonic

Euphonic is a Python package for efficient simulation of phonon bandstructures, density of states and inelastic neutron scattering intensities from force constants.
GNU General Public License v3.0
28 stars 11 forks source link

Investigate segfault on MacOS test runners #191

Closed rebeccafair closed 2 years ago

rebeccafair commented 3 years ago

In very specific cases the euphonic-dispersion, euphonic-intensity-map and euphonic-powder-map tests have been found to fail on the Github actions MacOS runners. This was discovered in #189 , details in a94e159d06af0

Some commands that may cause a fault:

rebeccafair commented 3 years ago

Looks like you can avoid the segfaults by using an MKL-linked Numpy (available via the default Conda channel). Apparently you can also change the backend of the Numpy installed via the conda-forge channel to use MKL but I haven't been able to get this to work myself yet. What backend libraries are used with the Numpy in Mantid @ajjackson ?

ajjackson commented 3 years ago

From a recent conda build on OSX:

np.show_config()
blas_info:
    libraries = ['cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['/Users/adam/mambaforge/envs/mantid-developer/lib']
    include_dirs = ['/Users/adam/mambaforge/envs/mantid-developer/include']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
    libraries = ['cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['/Users/adam/mambaforge/envs/mantid-developer/lib']
    include_dirs = ['/Users/adam/mambaforge/envs/mantid-developer/include']
    language = c
lapack_info:
    libraries = ['lapack', 'blas', 'lapack', 'blas']
    library_dirs = ['/Users/adam/mambaforge/envs/mantid-developer/lib']
    language = f77
lapack_opt_info:
    libraries = ['lapack', 'blas', 'lapack', 'blas', 'cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['/Users/adam/mambaforge/envs/mantid-developer/lib']
    language = c
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
    include_dirs = ['/Users/adam/mambaforge/envs/mantid-developer/include']

and a recent Nightly OSX build (i.e. downloaded from sourceforge, the current distribution method)

np.show_config()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

So it looks like the current build is openblas, and the Conda recipe uses some kind of reference lapack. Mantid does allow tweaks to Conda recipes for different platforms, so if we can show issues with the OSX build it might be possible to get that sorted. The preferred package source is conda-forge.

rebeccafair commented 2 years ago

Issue opened with Scipy: https://github.com/scipy/scipy/issues/15362

rebeccafair commented 2 years ago

I have now found this issue on Windows too (see https://github.com/rebeccafair/replicate_macos_segfault/actions/runs/3090606535 and comments in scipy issue).

I thought this was an edge case but now I've reproduced it on Windows too this could be more of an issue, especially as it results in a computation just aborting halfway through. So we should probably do something about it now. Options:

In the meantime I could try looking at compiling openblas myself and try to figure out which versions it happens with, but that could take a while with not much payoff, even in the unlikely event that a fix ended up getting into Scipy, it won't help users on older versions...

rebeccafair commented 2 years ago

Added workaround in #242, will reopen if there are any other solutions or if this issue resurfaces.