microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.61k stars 3.83k forks source link

[python-package] Documentation on setting up development environment #6350

Open nicklamiller opened 7 months ago

nicklamiller commented 7 months ago

Summary

Add documentation describing how to setup an environment for developing in the python-package.

Motivation

Currently there's no specification on how to setup an environment for developing in the python-package https://github.com/microsoft/LightGBM/pull/6310#issuecomment-1953487883, adding this would make the contribution process smoother.

Description

References

jameslamb commented 7 months ago

Thanks for writing this up!

We could give better guidance here. Until a doc like that's added, please post comments on this issue with specific questions and one of us will help.

nicklamiller commented 7 months ago

Thanks for outlining the dev environment setup steps! That helped a lot and seems like a great foundation for the developer setup documentation.

Following these initial steps, running pytest tests/python_package_test resulted in segmentation fault errors. This appeared to be an openMP issue and the original gcc and g++ compilers on my OS didn't have OpenMP support, so I brew install libomp and aliased my original gcc to gcc-13 and g++ to g++-13. These compilers did have openMP support, as I could compile dummy c and cpp files that used openMP. However, running pytest tests/python_package_test resulted in the same segfault errors as before.

After this, I followed the steps outlined in https://github.com/microsoft/LightGBM/issues/4229#issuecomment-1871489569 and am now getting aTypeError: Wrong type(ChunkedArray) error for several tests, all in test_arrow.py. I'd appreciate any feedback if you know a good way to handle this error/have seen it before and if it's indicative of a faulty setup 🙏.

Original segmentation fault message

``` tests/python_package_test/test_arrow.py Fatal Python error: Segmentation fault Thread 0x00000001fd04e080 (most recent call first): File "/Users/nick/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py", line 2377 in __init_from_csr File "/Users/nick/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py"[1] 2968 segmentation fault pytest tests/python_package_test ```

Segmentation fault message in LLDB

``` ============================= test session starts ============================== platform darwin -- Python 3.11.8, pytest-8.0.1, pluggy-1.4.0 rootdir: /Users/nick/development/LightGBM plugins: cov-4.1.0 collected 710 items / 1 skipped Process 3635 stopped * thread #20, stop reason = EXC_BAD_ACCESS (code=1, address=0x540) frame #0: 0x0000000175b1f7d4 libomp.dylib`__kmp_suspend_initialize_thread + 32 libomp.dylib`: -> 0x175b1f7d4 <+32>: ldr w8, [x0, #0x540] 0x175b1f7d8 <+36>: nop 0x175b1f7dc <+40>: ldr w9, 0x175b51308 ; _MergedGlobals + 8 0x175b1f7e0 <+44>: add w20, w9, #0x1 thread #21, stop reason = EXC_BAD_ACCESS (code=1, address=0x540) frame #0: 0x0000000175b1f7d4 libomp.dylib`__kmp_suspend_initialize_thread + 32 libomp.dylib`: -> 0x175b1f7d4 <+32>: ldr w8, [x0, #0x540] 0x175b1f7d8 <+36>: nop 0x175b1f7dc <+40>: ldr w9, 0x175b51308 ; _MergedGlobals + 8 0x175b1f7e0 <+44>: add w20, w9, #0x1 Target 0: (python) stopped. ```

TypeError: Wrong type(ChunkedArray) errors

``` ========================================================================= short test summary info ========================================================================== FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[-dataset_params0] - AssertionError: assert False FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[-dataset_params1] - AssertionError: assert False FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[-dataset_params2] - AssertionError: assert False FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[-dataset_params3] - AssertionError: assert False FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[-dataset_params4] - AssertionError: assert False FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fuzzy[-dataset_params5] - AssertionError: assert False FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_fields_fuzzy - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-array-label_data0] - TypeError: Wrong type(Int8Array) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type0-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-array-label_data0] - TypeError: Wrong type(Int16Array) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type1-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-array-label_data0] - TypeError: Wrong type(Int32Array) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type2-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-array-label_data0] - TypeError: Wrong type(Int64Array) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type3-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-array-label_data0] - TypeError: Wrong type(UInt8Array) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type4-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-array-label_data0] - TypeError: Wrong type(UInt16Array) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type5-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-array-label_data0] - TypeError: Wrong type(UInt32Array) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type6-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-array-label_data0] - TypeError: Wrong type(UInt64Array) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type7-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-array-label_data0] - TypeError: Wrong type(FloatArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type8-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-array-label_data0] - TypeError: Wrong type(DoubleArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-chunked_array-label_data1] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-chunked_array-label_data2] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_labels[arrow_type9-chunked_array-label_data3] - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights_none - TypeError: Wrong type(Int64Array) for weight. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-array-weight_data0] - TypeError: Wrong type(FloatArray) for weight. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-chunked_array-weight_data1] - TypeError: Wrong type(ChunkedArray) for weight. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-chunked_array-weight_data2] - TypeError: Wrong type(ChunkedArray) for weight. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type0-chunked_array-weight_data3] - TypeError: Wrong type(ChunkedArray) for weight. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-array-weight_data0] - TypeError: Wrong type(DoubleArray) for weight. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-chunked_array-weight_data1] - TypeError: Wrong type(ChunkedArray) for weight. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-chunked_array-weight_data2] - TypeError: Wrong type(ChunkedArray) for weight. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_weights[arrow_type1-chunked_array-weight_data3] - TypeError: Wrong type(ChunkedArray) for weight. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-array-group_data0] - TypeError: Wrong type(Int8Array) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type0-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-array-group_data0] - TypeError: Wrong type(Int16Array) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type1-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-array-group_data0] - TypeError: Wrong type(Int32Array) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type2-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-array-group_data0] - TypeError: Wrong type(Int64Array) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type3-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-array-group_data0] - TypeError: Wrong type(UInt8Array) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type4-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-array-group_data0] - TypeError: Wrong type(UInt16Array) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type5-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-array-group_data0] - TypeError: Wrong type(UInt32Array) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type6-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-array-group_data0] - TypeError: Wrong type(UInt64Array) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-chunked_array-group_data1] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-chunked_array-group_data2] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_groups[arrow_type7-chunked_array-group_data3] - TypeError: Wrong type(ChunkedArray) for group. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type0-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type1-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type2-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type3-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type4-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type5-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type6-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type7-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type8-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-array-init_score_data0] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-chunked_array-init_score_data1] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-chunked_array-init_score_data2] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_array[arrow_type9-chunked_array-init_score_data3] - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_dataset_construct_init_scores_table - TypeError: init_score must be list, numpy 1-D array or pandas Series. FAILED tests/python_package_test/test_arrow.py::test_predict_regression - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_predict_binary_classification - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_predict_multiclass_classification - TypeError: Wrong type(ChunkedArray) for label. FAILED tests/python_package_test/test_arrow.py::test_predict_ranking - TypeError: Wrong type(ChunkedArray) for label. ```

OS info

- OS: macOS 13.5 (Ventura) - CPU: M2 chip - compiler: AppleClang 15.0.0 - Python: 3.11.7 - OpenMP (libomp): 18.1.1

jameslamb commented 7 months ago

Thanks for the very detailed write-up and for working through this! Sorry it isn't easier.

I'm on mobile right now so apologies for being brief, but wanted to help unblock you. Try replacing this

cmake ..

with this

cmake -DUSE_OPENMP=OFF
nicklamiller commented 7 months ago

Thanks for the prompt response, and no worries, uncovering these speed bumps is giving a lot of fodder for the developer env documentation.

After making that change, I'm still getting the TypeError: Wrong type(ChunkedArray) errors and they're still in test_arrow.py. I noticed without the -DUSE_OPENMP=OFF (i.e. following the instructions in https://github.com/microsoft/LightGBM/issues/4229#issuecomment-1871489569 exactly)

otool -L lib_lightgbm.so

returns

lib_lightgbm.so:
        @rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
        /opt/homebrew/opt/libomp/lib/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1600.151.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

Whereas building with -DUSE_OPENMP=OFF

otool -L lib_lightgbm.so

returns

lib_lightgbm.so:
        @rpath/lib_lightgbm.so (compatibility version 0.0.0, current version 0.0.0)
        /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1600.151.0)
        /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.0.0)

So it looks like lib_lightgbm.so no longer links to OpenMP and addition of -DUSE_OPENMP=OFF was successful. This makes me wonder if this problem has to do with something outside of OpenMP. I see ChunkedArray is defined in a cpp file, and taking test_predict_ranking as an example, it looks like a python method _list_to_1d_numpy fails as it doesn't expect a ChunkedArray. Maybe there's an incompatibility between the cpp code and the python code, though I'm not sure how this could happen as I'm building the package with the most up-to-date code pulled.

test_predict_ranking specific error

```python /Users/nick/development/LightGBM/tests/python_package_test/test_arrow.py::test_predict_ranking failed: def test_predict_ranking(): data = generate_random_arrow_table(10, 10000, 42) dataset = lgb.Dataset( data, label=generate_random_arrow_array(10000, 43, generate_nulls=False, values=np.arange(4)), group=np.array([1000, 2000, 3000, 4000]), params=dummy_dataset_params(), ) > booster = lgb.train( {"objective": "lambdarank", "num_leaves": 7}, dataset, num_boost_round=5, ) tests/python_package_test/test_arrow.py:372: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/engine.py:260: in train booster = Booster(params=params, train_set=train_set) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3624: in __init__ train_set.construct() ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2563: in construct self._lazy_init( ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:2177: in _lazy_init self.set_label(label) ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:3050: in set_label label_array = _list_to_1d_numpy(label, dtype=np.float32, name="label") _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ data = [ [ 2, 2, 1, 0, 2, ... 3, 2, 0, 0, 0 ] ] dtype = , name = 'label' def _list_to_1d_numpy( data: Any, dtype: "np.typing.DTypeLike", name: str, ) -> np.ndarray: """Convert data to numpy 1-D array.""" if _is_numpy_1d_array(data): return _cast_numpy_array_to_dtype(data, dtype) elif _is_numpy_column_array(data): _log_warning("Converting column-vector to 1d array") array = data.ravel() return _cast_numpy_array_to_dtype(array, dtype) elif _is_1d_list(data): return np.array(data, dtype=dtype, copy=False) elif isinstance(data, pd_Series): _check_for_bad_pandas_dtypes(data.to_frame().dtypes) return np.array(data, dtype=dtype, copy=False) # SparseArray should be supported as well else: > raise TypeError( f"Wrong type({type(data).__name__}) for {name}.\n" "It should be list, numpy 1-D array or pandas Series" ) E TypeError: Wrong type(ChunkedArray) for label. E It should be list, numpy 1-D array or pandas Series ../../miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/basic.py:362: TypeError ```

jameslamb commented 7 months ago

if this problem has to do with something outside of OpenMP.

Turning off OpenMP linking was to help with the segfaults you reported. As you found with #4229 and similar, LightGBM's Python package has some outstanding issues with OpenMP support on macOS.

TypeError: Wrong type(ChunkedArray) errors and they're still in test_arrow.py

Please install pyarrow in your development environment and try again.

conda install -c conda-forge --yes pyarrow

Alternatively, ignore the Arrow tests if you're just working on the scikit-learn interface (as I suspect you are, given our discussion in #6310).

pytest tests/python_package_tests/test_sklearn.py