Open nicklamiller opened 7 months ago
Thanks for writing this up!
We could give better guidance here. Until a doc like that's added, please post comments on this issue with specific questions and one of us will help.
Thanks for outlining the dev environment setup steps! That helped a lot and seems like a great foundation for the developer setup documentation.
Following these initial steps, running pytest tests/python_package_test
resulted in segmentation fault errors. This appeared to be an openMP issue and the original gcc
and g++
compilers on my OS didn't have OpenMP support, so I brew install libomp
and aliased my original gcc
to gcc-13
and g++
to g++-13
. These compilers did have openMP support, as I could compile dummy c and cpp files that used openMP. However, running pytest tests/python_package_test
resulted in the same segfault errors as before.
After this, I followed the steps outlined in https://github.com/microsoft/LightGBM/issues/4229#issuecomment-1871489569 and am now getting aTypeError: Wrong type(ChunkedArray)
error for several tests, all in test_arrow.py
. I'd appreciate any feedback if you know a good way to handle this error/have seen it before and if it's indicative of a faulty setup 🙏.
``` tests/python_package_test/test_arrow.py Fatal Python error: Segmentation fault Thread 0x00000001fd04e080 (most recent call first): File "/Users/nick/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py", line 2377 in __init_from_csr File "/Users/nick/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py"[1] 2968 segmentation fault pytest tests/python_package_test ```
``` ============================= test session starts ============================== platform darwin -- Python 3.11.8, pytest-8.0.1, pluggy-1.4.0 rootdir: /Users/nick/development/LightGBM plugins: cov-4.1.0 collected 710 items / 1 skipped Process 3635 stopped * thread #20, stop reason = EXC_BAD_ACCESS (code=1, address=0x540) frame #0: 0x0000000175b1f7d4 libomp.dylib`__kmp_suspend_initialize_thread + 32 libomp.dylib`: -> 0x175b1f7d4 <+32>: ldr w8, [x0, #0x540] 0x175b1f7d8 <+36>: nop 0x175b1f7dc <+40>: ldr w9, 0x175b51308 ; _MergedGlobals + 8 0x175b1f7e0 <+44>: add w20, w9, #0x1 thread #21, stop reason = EXC_BAD_ACCESS (code=1, address=0x540) frame #0: 0x0000000175b1f7d4 libomp.dylib`__kmp_suspend_initialize_thread + 32 libomp.dylib`: -> 0x175b1f7d4 <+32>: ldr w8, [x0, #0x540] 0x175b1f7d8 <+36>: nop 0x175b1f7dc <+40>: ldr w9, 0x175b51308 ; _MergedGlobals + 8 0x175b1f7e0 <+44>: add w20, w9, #0x1 Target 0: (python) stopped. ```
```
========================================================================= short test summary info ==========================================================================
FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[
- OS: macOS 13.5 (Ventura) - CPU: M2 chip - compiler: AppleClang 15.0.0 - Python: 3.11.7 - OpenMP (libomp): 18.1.1
Thanks for the very detailed write-up and for working through this! Sorry it isn't easier.
I'm on mobile right now so apologies for being brief, but wanted to help unblock you. Try replacing this
cmake ..
with this
cmake -DUSE_OPENMP=OFF
Thanks for the prompt response, and no worries, uncovering these speed bumps is giving a lot of fodder for the developer env documentation.
After making that change, I'm still getting the TypeError: Wrong type(ChunkedArray)
errors and they're still in test_arrow.py
. I noticed without the -DUSE_OPENMP=OFF
(i.e. following the instructions in https://github.com/microsoft/LightGBM/issues/4229#issuecomment-1871489569 exactly)
otool -L lib_lightgbm.so
returns
lib_lightgbm.so:
@rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
/opt/homebrew/opt/libomp/lib/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1600.151.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)
Whereas building with -DUSE_OPENMP=OFF
otool -L lib_lightgbm.so
returns
lib_lightgbm.so:
@rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1600.151.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)
So it looks like lib_lightgbm.so
no longer links to OpenMP and addition of -DUSE_OPENMP=OFF
was successful. This makes me wonder if this problem has to do with something outside of OpenMP. I see ChunkedArray
is defined in a cpp file, and taking test_predict_ranking
as an example, it looks like a python method _list_to_1d_numpy
fails as it doesn't expect a ChunkedArray
. Maybe there's an incompatibility between the cpp code and the python code, though I'm not sure how this could happen as I'm building the package with the most up-to-date code pulled.
```python
/Users/nick/development/LightGBM/tests/python_package_test/test_arrow.py::test_predict_ranking failed: def test_predict_ranking():
data = generate_random_arrow_table(10, 10000, 42)
dataset = lgb.Dataset(
data,
label=generate_random_arrow_array(10000, 43, generate_nulls=False, values=np.arange(4)),
group=np.array([1000, 2000, 3000, 4000]),
params=dummy_dataset_params(),
)
> booster = lgb.train(
{"objective": "lambdarank", "num_leaves": 7},
dataset,
num_boost_round=5,
)
tests/python_package_test/test_arrow.py:372:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/engine.py:260: in train
booster = Booster(params=params, train_set=train_set)
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3624: in __init__
train_set.construct()
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2563: in construct
self._lazy_init(
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2177: in _lazy_init
self.set_label(label)
../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3050: in set_label
label_array = _list_to_1d_numpy(label, dtype=np.float32, name="label")
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
data =
if this problem has to do with something outside of OpenMP.
Turning off OpenMP linking was to help with the segfaults you reported. As you found with #4229 and similar, LightGBM's Python package has some outstanding issues with OpenMP support on macOS.
TypeError: Wrong type(ChunkedArray) errors and they're still in test_arrow.py
Please install pyarrow
in your development environment and try again.
conda install -c conda-forge --yes pyarrow
Alternatively, ignore the Arrow tests if you're just working on the scikit-learn interface (as I suspect you are, given our discussion in #6310).
pytest tests/python_package_tests/test_sklearn.py
Summary
Add documentation describing how to setup an environment for developing in the python-package.
Motivation
Currently there's no specification on how to setup an environment for developing in the python-package https://github.com/microsoft/LightGBM/pull/6310#issuecomment-1953487883, adding this would make the contribution process smoother.
Description
CONTRIBUTING.md
with step-by-step instructions on how to setup developer environment (as outlined in https://github.com/microsoft/LightGBM/pull/6310#issuecomment-1953487883), including:dev
section to the existingpyproject.toml
underproject.optional-dependencies
with dev dependencies, so one can install viapip install '.[dev]'
References