microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.53k stars 3.82k forks source link

[python-package] SegFault on MacOS when pytorch is installed #6595

Open connortann opened 1 month ago

connortann commented 1 month ago

Description

A segmentation fault occurs on MacOS when lightgbm and pytorch are both installed, depending on the order of imports.

Possibly related: #4229

Reproducible example

To reproduce the issue on GH actions:

# run_tests.yml
jobs:
  run_tests:
    runs-on: macos-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: 3.11
      - run: brew install libomp
      - run: pip install pytest torch scikit-learn lightgbm
      - run: pip list
      - run: pytest --noconftest test_bug.py
# test_bug.py
import time

import lightgbm  # Issue only occurs if this import is present
import torch
from sklearn.datasets import fetch_california_housing

def test_something():
    X, y = fetch_california_housing(return_X_y=True)
    torch.tensor(X)
    time.sleep(3)

Leads to Fatal Python error: Segmentation fault. Full output:

``` Run pytest --noconftest tests/test_bug121101.py ============================= test session starts ============================== platform darwin -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0 rootdir: /Users/runner/work/shap/shap configfile: pyproject.toml collected 1 item Fatal Python error: Segmentation fault Thread 0x0000000204c1cc00 (most recent call first): tests/test_bug121[10](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:11)1.py File "/Users/runner/work/shap/shap/tests/test_bug121101.py", line 12 in test_something File "/Library/Frameworks/Python.framework/Versions/3.[11](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:12)/lib/python3.11/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line [12](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:13)0 in _hookexec File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 5[13](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:14) in __call__ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/python.py", line [16](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:17)27 in runtest File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line [17](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:18)4 in pytest_runtest_call File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 242 in File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 241 in call_and_report File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 132 in runtestprotocol File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 362 in pytest_runtestloop File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line Fatal Python error: Segmentation fault 337 in _main File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 283 in wrap_session File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 175 in main File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 201 in console_main File "/Users/runner/hostedtoolcache/Python/3.11.9/arm64/bin/pytest", line 8 in Extension modules: numpy._core._multiarray_umath, numpy._core._multiarray_tests, numpy.linalg._umath_linalg, scipy._lib._ccallback_c, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt[19](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:20)937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator Extension modules: , numpy._core._multiarray_umathscipy.sparse._sparsetools, numpy._core._multiarray_tests, _csparsetools, numpy.linalg._umath_linalg, scipy.sparse._csparsetools, scipy._lib._ccallback_c, scipy.linalg._fblas, numpy.random._common, scipy.linalg._flapack, numpy.random.bit_generator, , scipy.linalg.cython_lapacknumpy.random._bounded_integers, , scipy.linalg._cythonized_array_utilsnumpy.random._mt19937, , scipy.linalg._solve_toeplitznumpy.random.mtrand, , numpy.random._philoxscipy.linalg._decomp_lu_cython, numpy.random._pcg64, scipy.linalg._matfuncs_sqrtm_triu, numpy.random._sfc64, scipy.linalg.cython_blas, numpy.random._generator, scipy.linalg._matfuncs_expm, scipy.sparse._sparsetools, scipy.linalg._decomp_update, _csparsetools, , scipy.sparse._csparsetoolsscipy.sparse.linalg._dsolve._superlu, , scipy.linalg._fblasscipy.sparse.linalg._eigen.arpack._arpack, scipy.linalg._flapack, , scipy.linalg.cython_lapackscipy.sparse.linalg._propack._spropack, scipy.linalg._cythonized_array_utils, scipy.sparse.linalg._propack._dpropack, scipy.linalg._solve_toeplitz, scipy.sparse.linalg._propack._cpropack, scipy.linalg._decomp_lu_cython, scipy.sparse.linalg._propack._zpropack, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.sparse.csgraph._tools, scipy.linalg._matfuncs_expm, scipy.sparse.csgraph._shortest_path, scipy.linalg._decomp_update, scipy.sparse.csgraph._traversal, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, , scipy.sparse.csgraph._min_spanning_treescipy.sparse.linalg._propack._spropack, , scipy.sparse.csgraph._flowscipy.sparse.linalg._propack._dpropack, , scipy.sparse.csgraph._matchingscipy.sparse.linalg._propack._cpropack, , scipy.sparse.csgraph._reorderingscipy.sparse.linalg._propack._zpropack, , scipy.sparse.csgraph._toolssklearn.__check_build._check_build, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, , scipy.sparse.csgraph._reorderingscipy.special._ufuncs_cxx, , sklearn.__check_build._check_buildscipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ufuncs_cxx, scipy.special._ellip_harm_2, scipy.special._ufuncs, scipy.spatial._ckdtree, scipy.special._specfun, scipy._lib.messagestream, scipy.special._comb, scipy.spatial._qhull, scipy.special._ellip_harm_2, scipy.spatial._voronoi, scipy.spatial._ckdtree, , scipy.spatial._distance_wrapscipy._lib.messagestream, , scipy.spatial._hausdorffscipy.spatial._qhull, scipy.spatial._voronoi, , scipy.spatial._distance_wrapscipy.spatial.transform._rotation, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.special.cython_special, scipy.stats._stats, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, , scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNCscipy.stats._mvn, scipy.optimize._cobyla, scipy.stats._rcont.rcont, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.stats._unuran.unuran_wrapper, scipy.optimize._lsq.givens_elimination, , scipy.optimize._zeros, scipy.ndimage._nd_imagescipy.optimize._highs.cython.src._highs_wrapper, , scipy.optimize._highs._highs_wrapper_ni_label, , scipy.optimize._highs.cython.src._highs_constantsscipy.ndimage._ni_label, scipy.optimize._highs._highs_constants, sklearn.utils._isfinite, scipy.linalg._interpolative, sklearn.utils.sparsefuncs_fast, scipy.optimize._bglu_dense, sklearn.utils.murmurhash, scipy.optimize._lsap, , sklearn.utils._openmp_helpersscipy.optimize._direct, scipy.integrate._odepack, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics.cluster._expected_mutual_info_fast, scipy.integrate._quadpack, sklearn.metrics._dist_metrics, scipy.integrate._vode, sklearn.metrics._pairwise_distances_reduction._datasets_pair, scipy.integrate._dop, scipy.integrate._lsoda, sklearn.utils._cython_blas, scipy.interpolate._fitpack, sklearn.metrics._pairwise_distances_reduction._base, scipy.interpolate._dfitpack, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, scipy.interpolate._bspl, sklearn.utils._heap, scipy.interpolate._ppoly, sklearn.utils._sorting, scipy.interpolate.interpnd, sklearn.metrics._pairwise_distances_reduction._argkmin, scipy.interpolate._rbfinterp_pythran, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, scipy.interpolate._rgi_cython, scipy.special.cython_special, sklearn.utils._vector_sentinel, scipy.stats._stats, , sklearn.metrics._pairwise_distances_reduction._radius_neighborsscipy.stats._biasedurn, , sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmodescipy.stats._levy_stable.levyst, , scipy.stats._stats_pythransklearn.metrics._pairwise_fast, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, sklearn.utils._random, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, torch._C, scipy.stats._rcont.rcont, , scipy.stats._unuran.unuran_wrappertorch._C._fft, , scipy.ndimage._nd_imagetorch._C._linalg, , _ni_labeltorch._C._nested, , scipy.ndimage._ni_labeltorch._C._nn, , sklearn.utils._isfinitetorch._C._sparse, , sklearn.utils.sparsefuncs_fasttorch._C._special, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, sklearn.utils._random, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, , scipy.io.matlab._mio_utilstorch._C._nn, torch._C._sparse, scipy.io.matlab._streams, torch._C._special, scipy.io.matlab._mio5_utils, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, , sklearn.datasets._svmlight_format_fastscipy.io.matlab._mio5_utils, sklearn.datasets._svmlight_format_fast, sklearn.feature_extraction._hashing_fast (total: 130, )sklearn.feature_extraction._hashing_fast (total: 130) /Users/runner/work/_temp/7013399c-b6ff-43a4-b289-cc08191dbadb.sh: line 1: 2783 Segmentation fault: 11 pytest --noconftest tests/test_bug1[21](https://github.com/shap/shap/actions/runs/10281087386/job/28449834033#step:7:22)101.py ```

Environment info

LightGBM version or commit hash: 4.5.0

Result of pip list:

``` Package Version ----------------- -------- certifi 2024.7.4 filelock 3.15.4 fsspec 2024.6.1 iniconfig 2.0.0 Jinja2 3.1.4 joblib 1.4.2 lightgbm 4.5.0 MarkupSafe 2.1.5 mpmath 1.3.0 networkx 3.3 numpy 2.0.1 packaging 24.1 pip 24.2 pluggy 1.5.0 pytest 8.3.2 scikit-learn 1.5.1 scipy 1.14.0 setuptools 65.5.0 sympy 1.13.1 threadpoolctl 3.5.0 torch 2.4.0 ```

Additional Comments

We came across this issue over at the shap repo, trying to run tests with the latest versions of both pytorch and lightgbm. We initially raised this issue on the pytorch issue tracker: https://github.com/pytorch/pytorch/issues/121101 .

However, the underlying issue doesn't seem to be specific just to pytorch or lightgbm, but rather it relates to the mutual compatibility of pytorch and lightgbm. The issue seems to relate to multiple ~OpenML~ OpenMP runtimes being loaded.

So, I thought it would be worth raising the issue here too in the hope that it helps us collectively find a fix.

jameslamb commented 1 month ago

Thanks for the excellent report @connortann !

Since #6391, import lightgbm on macOS will try to use the already-loaded OpenMP if there is one. So it shouldn't be the case that import lightgbm can cause "multiple OpenMP runtimes being loaded".

(assuming that was typo in your original report and you really mean "OpenMP", not "OpenML")

Since you have scikit-learn in the environment, import lightgbm will import sklearn. I suspect that scikit-learn may be contributing to this problem. In the past, we've seen that library's handling of its OpenMP dependency contribute to this "multiple OpenMP runtimes being loaded" situation.

To narrow it down further, could you try 2 other tests?

I'm sorry to possibly involve yet a THIRD project in your investigation. I'm familiar with these topics and happy to help us all reach a resolution.

You may also find these relevant:

connortann commented 1 month ago

Thanks for the response! Yes I think you're right about sklearn being relevant: the bug seems not to occur if sklearn is not imported.

Here's what I tried: the tests pass in all these situations

  1. import sklearn, then torch (no lightgbm involved). Tests pass.
```python import time import sklearn import torch from sklearn.datasets import fetch_california_housing def test_something(): X, y = fetch_california_housing(return_X_y=True) torch.tensor(X) time.sleep(3) ```
  1. import torch then sklearn (no lightgbm involved). Tests pass.
```python import time import torch import sklearn from sklearn.datasets import fetch_california_housing def test_something(): X, y = fetch_california_housing(return_X_y=True) torch.tensor(X) time.sleep(3) ```
  1. Without sklearn installed; import lightgbm then torch. Tests pass
```python import time import lightgbm import torch import numpy as np # from sklearn.datasets import fetch_california_housing def test_something(): # X, y = fetch_california_housing(return_X_y=True) X = np.ones(shape=(200, 20)) torch.tensor(X) time.sleep(3) ```
  1. Without sklearn installed; import torch then lightgbm. Tests pass
```python # ruff: noqa # fmt: off import time import torch import lightgbm import numpy as np # from sklearn.datasets import fetch_california_housing def test_something(): # X, y = fetch_california_housing(return_X_y=True) X = np.ones(shape=(200, 20)) torch.tensor(X) time.sleep(3) ```

So, I think the example above is the minimal reproducer: lightgbm, torch and sklearn!

vnherdeiro commented 2 weeks ago

Adding my two cents to this issue. I managed to reproduce the bug following the setting given by @connortann

Running the following command raises the segfault python -m pytest test_bug.py with torch==2.2.2 scikit-learn==1.5.1 numpy==1.26.4 lightgbm==4.5.0

but if prepending the command with OMP_NUM_THREADS=1 (forcing single thread operations) then it irons out the segfault.

lorentzenchr commented 2 days ago

@lesteve ping as scikit-learn is involved in the minimal reproducer (openmp related).

lesteve commented 1 day ago

Honestly @jeremiedbb may be a better person on this on the scikit-learn side. To be honest this is quite a tricky topic at the interface of different projects which make different choices how to tackle OpenMP with wheels and OpenMP in itself is already tricky.

The root cause is generally using multiple OpenMP and using threadpoolctl can highlight this, see this doc and below.

One known work-around is to use conda-forge which will use a single OpenMP and avoid most of these issues. I wanted to mention it, even if I understand using conda rather than pip is a non-starter in some use cases.

In this particular case, I played a bit with the code and can reproduce without scikit-learn, i.e. only with LightGBM and PyTorch. To be honest, I have heard of cases that go wrong with PyTorch and scikit-learn for similar reasons, but it's generally a bit hard to get a reproducer ...

I put together a quick repo: https://github.com/lesteve/lightgbm-pytorch-macos-segfault.

In particular, see build log which shows a segfault, python file, worflow YAML file. Importing pytorch before lightgbm works fine, see build log.

Python file:

import pprint
import sys
import platform

import lightgbm
import torch
import threadpoolctl

print('version: ', sys.version, flush=True)
print('platform: ', platform.platform(), flush=True)
pprint.pprint(threadpoolctl.threadpool_info())

print('before torch tensor', flush=True)
t = torch.ones(200_000)
print('after torch tensor', flush=True) 

Output:

version:  3.12.5 (v3.12.5:ff3bc82f7c9, Aug  7 2024, 05:32:06) [Clang 13.0.0 (clang-1300.0.29.30)]
platform:  macOS-14.6.1-arm64-arm-64bit
[{'architecture': 'armv8',
  'filepath': '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/.dylibs/libopenblas64_.0.dylib',
  'internal_api': 'openblas',
  'num_threads': 3,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'},
 {'filepath': '/opt/homebrew/Cellar/libomp/18.1.8/lib/libomp.dylib',
  'internal_api': 'openmp',
  'num_threads': 3,
  'prefix': 'libomp',
  'user_api': 'openmp',
  'version': None},
 {'filepath': '/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/torch/lib/libomp.dylib',
  'internal_api': 'openmp',
  'num_threads': 3,
  'prefix': 'libomp',
  'user_api': 'openmp',
  'version': None}]
before torch tensor
/Users/runner/work/_temp/558d95ac-031b-4858-bfb0-b7bb4841e27b.sh: line 1:  1924 Segmentation fault: 11  python test.py

From the threadpoolctl info, you can tell that there are multiple OpenMP in use the brew one (from LightGBM) and the PyTorch one bundled in the wheel.

pip list

Package           Version
----------------- ---------
certifi           2024.8.30
filelock          3.16.0
fsspec            2024.9.0
Jinja2            3.1.4
lightgbm          4.5.0
MarkupSafe        2.1.5
mpmath            1.3.0
networkx          3.3
numpy             1.26.4
pip               24.2
scipy             1.14.1
setuptools        74.1.2
sympy             1.13.2
threadpoolctl     3.5.0
torch             2.4.1
typing_extensions 4.12.2

(Edit: sorry pinged the wrong Jérémie originally ...)

jameslamb commented 5 hours ago

Thanks very much for that! Your example has helped to clarify the picture for me a lot.

Short Summary

torch vendors a libomp.dylib (without library or symbol name mangling) and always prefers that vendored copy to a system installation.

lightgbm searches for a system installation.

As a result, if you've installed both these libraries via wheels on macOS, loading both will result in 2 copies of libomp.dylib being loaded. This may or may not show up as runtime issues... unpredictable, because symbol resolution is lazy by default and therefore depends on the code paths used.

Even if all copies of libomp.dylib loaded into the process are ABI-compatible with each other, there can still be runtime segfaults as a result of mixing symbols from libraries loaded at different memory addresses, I think.

Longer Summary

more details (click me) I investigated this by running the following on my M2 Mac, with Python 3.11. Note that the versions are identical to those from the previous comment. ```shell mkdir ./delete-me cd ./delete-me pip download \ --no-deps \ 'lightgbm==4.5.0' \ 'torch==2.4.1' unzip ./lightgbm*.whl unzip ./torch*.whl otool -l ./lightgbm/lib/lib_lightgbm.dylib otool -l ./torch/lib/libtorch_cpu.dylib ``` `lightgbm` wheels have exactly one library, `lib_lightgbm.dylib`, with an OpenMP dependency like this: ```text @rpath/libomp.dylib (compatibility version 5.0.0, current version 5.0.0) ``` And the following LC_LOAD_DYLIB / LC_RPATH entries ```text cmd LC_LOAD_DYLIB name @rpath/libomp.dylib (offset 24) current version 5.0.0 compatibility version 5.0.0 ... cmd LC_RPATH path /opt/homebrew/opt/libomp/lib (offset 12) ... cmd LC_RPATH path /opt/local/lib/libomp (offset 12) ``` `torch` wheels vendor `libomp.dylib` **but without mangling the library name or its symbols**. ```text libc10.dylib libomp.dylib libshm.dylib libtorch.dylib libtorch_cpu.dylib libtorch_global_deps.dylib libtorch_python.dylib ``` `libtorch_cpu.dylib` expresses its OpenMP dependency like this: ```text @rpath/libomp.dylib (compatibility version 5.0.0, current version 5.0.0) ``` And has the following LC_LOAD_DYLIB / LC_RPATH entries: ```text cmd LC_LOAD_DYLIB name @rpath/libomp.dylib (offset 24) current version 5.0.0 compatibility version 5.0.0 ... cmd LC_RPATH path @loader_path (offset 12) ``` So `lightgbm` will search for `libomp.dylib` in various places (including where Homebrew likes to put it) and loads the first one found. `torch` will ONLY and ALWAYS load exactly the one that its wheels vendor. 💥 2 copies of OpenMP loaded at the same time, and all the issues that comes with that.

Why didn't @connortann observe this same behavior?

Not sure why @connortann was not able to reproduce this in https://github.com/microsoft/LightGBM/issues/6595#issuecomment-2273745921. That comment shows:

Without sklearn installed; import torch then lightgbm. Tests pass

Probably because that example uses different codepaths in torch. Many OpenMP symbols would be resolved only at the first call site (as described in this Stack Overflow answer and the macOS docs it links to), so different code paths can lead to different behavior in terms of which copies of libomp.dylib certain symbols are found in.

How do we fix this?

I think some mix of the following would make this better for users.

Option 1: torch could more aggressively isolate its OpenMP dependency

If torch wants to vendor its own OpenMP in this way, it could further isolate that dependency to only torch's own uses, by doing one of the following:

Option 2a: lightgbm could vendor OpenMP like torch is, but with that added strictness described above

I really do not want to do this, for the reasons mentioned in in #6391 and the things linked to it.

Option 2b: torch could stop vendoring OpenMP and use the same LC_RPATH search order lightgbm does

I don't know if this would be palatable for torch. It comes with its own challenges.

Option 3: lightgbm could add something like @loader_path/../../torch/lib earlier in its list of RPATHS

This only works as long as torch is vendoring a version of libomp.dylib that lightgbm is ABI-compatible with.

And it only helps for the narrow case of lightgbm and torch with no other OpenMP-using dependencies. Every other library depending on OpenMP (e.g. xgboost, scikit-learn) would need to do something similar for them to all reliably use that same copy of libomp.dylib at runtime.

Option 4: OpenMP could be packaged as a wheel that all of these projects depend on (and dynamically link to)

As described in https://pypackaging-native.github.io/key-issues/native-dependencies/blas_openmp/#potential-solutions-or-mitigations. This is the wheel-based equivalent of how conda handles this case, as @lesteve alluded to... you download a single copy of the library into the environment, and everything else dynamically links to it.

I personally would be willing to help with this community effort, though I don't feel qualified to lead it.

Some related discussions (about shared-library-only wheels, not OpenMP) that have been happening in RAPIDS libraries: