Closed tskisner closed 7 months ago
Although the toast workflows in sotodlib do not hit this issue, the preprocess_tod.py
file in site pipeline does. That gives a clue, since it specifically imports scipy.signal
. This is the next place to look for threading model collisions. There are many compiled extensions in scipy, so it was challenging to find the source of the problem.
Another data point: if I install toast using the conda compilers and conda dependencies outside of the conda-bld environment (just with a regular environment loaded with all those tools, using the normal cmake build), then there is no segfault. This points to something about the build environment being incompatible, and is the next thing to investigate.
I have tracked this down to a single shared library (libarcher.so
) installed by the libactpol
package. Using the changes in #31, If I install all packages in soconda and do:
python -c 'import scipy; import toast'
(or for example, import the site pipeline), then I get a segfault. If I remove the libactpol package, then no segfault. If I reinstall libactpol and then manually delete every other library and exectutable installed by that package, I still get the segfault. When I manually delete that libarcher.so
library, everything works. I think this is actually an LLVM utility that is being picked up and bundled during the libactpol package build. Upon installation, it seems to be overwriting an existing version of libarcher.so. All other packages now build without further problems in #31, and now it is a matter of figuring out why this is getting bundled and preventing that.
Fixed by #31
In our environments, we build packages like
so3g
andtoast
which have extensions linking to OpenMP libraries, both directly and indirectly through a dependency on OpenBLAS. We pin the runtime versions of BLAS / LAPACK to use the OpenMP "flavor" of OpenBLAS. From a newly created environment, runningldd
on the compiled extensions shows consistent linking of these compiled extensions against the same versions of OpenBLAS used by SciPy.Despite this, at least on some systems, doing
import scipy; import toast
triggers a segfault, and this segfault occurs in the toast compiled extension at the first call toomp_get_num_threads()
. Reversing the import order does not segfault, but obviously there is a concern that there could be a silent error in later scipy calls in this case.Documenting some more aspects:
so3g
. Since the scipy package contains numerous extensions, the problem may be triggered by loading a specific scipy submodule with a compiled extension linking to OpenMP (directly or indirectly). For example, thesignal
submodule.The purpose of this issue is to track notes and ideas while exploring options to fix this.