microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.34k stars 3.8k forks source link

[ci] [python-package] Python tests leave files behind #6361

Open jameslamb opened 3 months ago

jameslamb commented 3 months ago

Description

The Python unit tests in this project leave some files behind when they are done running.

They should be modified to use Python-managed temporary files that are automatically removed, so that:

Reproducible example

Build the Python package and run the Python tests.

cmake -B build -S .
cmake --build build --target _lightgbm
sh build-python.sh install --precompile
pytest tests/python_package_test

(for more details on this, see #6350).

Look at the files created.

git status --ignored

As of latest master (https://github.com/microsoft/LightGBM/commit/b27d81ea411d04d8d071d4d4e75c19ffa15c5795), you'll see all of these created by tests:

categorical.model
lgb.model
lgb.pkl
lgb_train_data.bin
model.txt
Tree4.gv.pdf
Tree4.gv

Approach

Find the tests that created those files, and ensure that they stop creating them.

For example, it looks like lgb.model probably comes from here:

https://github.com/microsoft/LightGBM/blob/b27d81ea411d04d8d071d4d4e75c19ffa15c5795/tests/python_package_test/test_engine.py#L1373

And that that could be avoided using pytests's tmp_path fixture, like this:

https://github.com/microsoft/LightGBM/blob/b27d81ea411d04d8d071d4d4e75c19ffa15c5795/tests/python_package_test/test_engine.py#L727

https://github.com/microsoft/LightGBM/blob/b27d81ea411d04d8d071d4d4e75c19ffa15c5795/tests/python_package_test/test_engine.py#L744

For more on how that works, see "How to use temporary directories and files in tests" (pytest docs).

Additional Comments

You do not need to put up a pull request fixing all of these! Contributions that fix any of these would be welcomed.

This list will be updated as these are fixed:

If you are interested in working on this, comment here to indicate that and to ask for help if you need it.

Hitro147 commented 3 months ago

Hi @jameslamb ,

I'm new to open source and would like to take up this issue.

Thanks!

jameslamb commented 3 months ago

Sure, thanks! @ me here if you have any questions.

Hitro147 commented 3 months ago

Hey @jameslamb,

I have encountered a few issues while building the Python package. However, I have managed to build it successfully now. But, I am facing some errors while running the tests. I am not able to find the requirements.txt file. Can you suggest any way to install all the necessary modules?

Best, Shrikanth

Errors after running pytest tests/python_package_test

==================================================== test session starts =====================================================
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /Users/hitro/Desktop/Microsoft/LightGBM
collected 3 items / 9 errors                                                                                                 

=========================================================== ERRORS ===========================================================
__________________________________ ERROR collecting tests/python_package_test/test_arrow.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_arrow.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_arrow.py:6: in <module>
    import pyarrow as pa
E   ModuleNotFoundError: No module named 'pyarrow'
__________________________________ ERROR collecting tests/python_package_test/test_basic.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_basic.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_basic.py:12: in <module>
    from sklearn.datasets import dump_svmlight_file, load_svmlight_file
E   ModuleNotFoundError: No module named 'sklearn'
________________________________ ERROR collecting tests/python_package_test/test_callback.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_callback.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_callback.py:6: in <module>
    from .utils import SERIALIZERS, pickle_and_unpickle_object
tests/python_package_test/utils.py:6: in <module>
    import cloudpickle
E   ModuleNotFoundError: No module named 'cloudpickle'
_______________________________ ERROR collecting tests/python_package_test/test_consistency.py _______________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_consistency.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_consistency.py:5: in <module>
    from sklearn.datasets import load_svmlight_file
E   ModuleNotFoundError: No module named 'sklearn'
__________________________________ ERROR collecting tests/python_package_test/test_dask.py ___________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_dask.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_dask.py:14: in <module>
    from sklearn.metrics import accuracy_score, r2_score
E   ModuleNotFoundError: No module named 'sklearn'
__________________________________ ERROR collecting tests/python_package_test/test_dual.py ___________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_dual.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_dual.py:8: in <module>
    from sklearn.metrics import log_loss
E   ModuleNotFoundError: No module named 'sklearn'
_________________________________ ERROR collecting tests/python_package_test/test_engine.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_engine.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_engine.py:15: in <module>
    import psutil
E   ModuleNotFoundError: No module named 'psutil'
________________________________ ERROR collecting tests/python_package_test/test_plotting.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_plotting.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_plotting.py:3: in <module>
    import pandas as pd
E   ModuleNotFoundError: No module named 'pandas'
_________________________________ ERROR collecting tests/python_package_test/test_sklearn.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_sklearn.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_sklearn.py:9: in <module>
    import joblib
E   ModuleNotFoundError: No module named 'joblib'
================================================== short test summary info ===================================================
ERROR tests/python_package_test/test_arrow.py
ERROR tests/python_package_test/test_basic.py
ERROR tests/python_package_test/test_callback.py
ERROR tests/python_package_test/test_consistency.py
ERROR tests/python_package_test/test_dask.py
ERROR tests/python_package_test/test_dual.py
ERROR tests/python_package_test/test_engine.py
ERROR tests/python_package_test/test_plotting.py
ERROR tests/python_package_test/test_sklearn.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 9 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================================================== 9 errors in 3.31s ======================================================
jameslamb commented 3 months ago

Thanks for trying it out!

Please post error messages and logs as plaintext, not images, so they can be found from search engines. See these resources:

any way to install all the necessary modules

Follow these steps (but add pyarrow): https://github.com/microsoft/LightGBM/pull/6310#issuecomment-1953487883

Hitro147 commented 3 months ago

Sorry about that! I've updated my comment.

Thanks for the information. I'll start working on it 😄

jameslamb commented 3 months ago

I found another one generated by the Dask tests, added it above.

https://github.com/microsoft/LightGBM/blob/631e0a2a7bdd694a91f30378fb271d05ce438122/tests/python_package_test/test_dask.py#L1534

jameslamb commented 2 months ago

@Hitro147 Are you still interested in pursuing this?

Hitro147 commented 2 months ago

Hello @jameslamb,

I'm facing some issues with my current environment, but I'll need time to resolve them. However, I need to put it on hold for a while, if it's open after a while I'd like to return to it when I have more time. Feel free to assign this to someone if they are interested in this.

Thanks for giving me this opportunity! 😄

jameslamb commented 2 months ago

Ok sure, no problem. Comment here or on #6350 any time if you need help.

Anyone else reading this... you are welcome to contribute! A PR even just eliminating one of these left-behind files would be greatly appreciated 😊

Arup-Chauhan commented 3 weeks ago

@jameslamb, I would like to contribute to this issue, or any related good first issue (as there are multiple mentioned), here in the repository

jameslamb commented 3 weeks ago

Sure! This is a great issue to start with @Arup-Chauhan .

I recommend focusing on a single file like categorical.model in your first contribution, to get used to the process. You can find where it's used like this:

git grep -E 'categorical\.model'

Thanks for spending some time on LightGBM, we really appreciate it!

Arup-Chauhan commented 3 weeks ago

Hi @jameslamb , thanks for this, I will get started, will reach out to you if I need assistance