simonsobs / soconda

Simons Observatory Conda Tools
BSD 2-Clause "Simplified" License
1 stars 2 forks source link

OpenMP Conflicts #26

Closed tskisner closed 7 months ago

tskisner commented 8 months ago

In our environments, we build packages like so3g and toast which have extensions linking to OpenMP libraries, both directly and indirectly through a dependency on OpenBLAS. We pin the runtime versions of BLAS / LAPACK to use the OpenMP "flavor" of OpenBLAS. From a newly created environment, running ldd on the compiled extensions shows consistent linking of these compiled extensions against the same versions of OpenBLAS used by SciPy.

Despite this, at least on some systems, doing import scipy; import toast triggers a segfault, and this segfault occurs in the toast compiled extension at the first call to omp_get_num_threads(). Reversing the import order does not segfault, but obviously there is a concern that there could be a silent error in later scipy calls in this case.

Documenting some more aspects:

The purpose of this issue is to track notes and ideas while exploring options to fix this.

tskisner commented 7 months ago

Although the toast workflows in sotodlib do not hit this issue, the preprocess_tod.py file in site pipeline does. That gives a clue, since it specifically imports scipy.signal. This is the next place to look for threading model collisions. There are many compiled extensions in scipy, so it was challenging to find the source of the problem.

tskisner commented 7 months ago

Another data point: if I install toast using the conda compilers and conda dependencies outside of the conda-bld environment (just with a regular environment loaded with all those tools, using the normal cmake build), then there is no segfault. This points to something about the build environment being incompatible, and is the next thing to investigate.

tskisner commented 7 months ago

I have tracked this down to a single shared library (libarcher.so) installed by the libactpol package. Using the changes in #31, If I install all packages in soconda and do:

python -c 'import scipy; import toast'

(or for example, import the site pipeline), then I get a segfault. If I remove the libactpol package, then no segfault. If I reinstall libactpol and then manually delete every other library and exectutable installed by that package, I still get the segfault. When I manually delete that libarcher.so library, everything works. I think this is actually an LLVM utility that is being picked up and bundled during the libactpol package build. Upon installation, it seems to be overwriting an existing version of libarcher.so. All other packages now build without further problems in #31, and now it is a matter of figuring out why this is getting bundled and preventing that.

tskisner commented 7 months ago

Fixed by #31