microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.7k stars 3.83k forks source link

[ci] [python-package] scikit-learn compatibility tests fail with scikit-learn 1.6.dev0 #6653

Closed jameslamb closed 1 month ago

jameslamb commented 2 months ago

Description

Starting a few days ago, the scikit-learn compatibility checks here have been failing with the following errors:

E AssertionError: Estimator LGBMClassifier should not set any attribute apart from parameters during init. Found attributes ['min_data_in_bin'].

E AssertionError: Estimator LGBMClassifier doesn't check for NaN and inf in fit.

E AssertionError: ('_more_tags() was removed in 1.6. Please use __sklearn_tags__ instead.',)

And the same for LGBMRegressor.

This is only happening with the 1.6.dev0 nightlies of scikit-learn.

Reproducible example

This is happening across all pull requests here, even those not related to the Python package in any way. For example, build log from #6648: https://github.com/microsoft/LightGBM/actions/runs/10776737786/job/29884208680?pr=6648

Environment info

installed packages (click me) > contourpy-1.3.1.dev1 cycler-0.12.1 fonttools-4.53.1 joblib-1.4.2 kiwisolver-1.4.7 matplotlib-3.10.0.dev587+g1c892c2033 numpy-2.2.0.dev0 pandas-3.0.0.dev0+1452.g80b6850271 pillow-11.0.0.dev0 pyparsing-3.1.4 python-dateutil-2.9.0.post0 scikit-learn-1.6.dev0 scipy-1.15.0.dev0 six-1.16.0 threadpoolctl-3.5.0 tzdata-2024.1 build link: https://github.com/microsoft/LightGBM/actions/runs/10776737786/job/29884208680?pr=6648#step:4:140

Additional Comments

Where this test is configured: https://github.com/microsoft/LightGBM/blob/41ba9e8f00c89d72e5cb71c964722ce1ed4d8445/.github/workflows/python_package.yml#L93

@vnherdeiro started investigating 1 of the 3 issues (the one about _more_tags()) in #6651. Some notes from there:

Other related discussions:

vnherdeiro commented 2 months ago

@jameslamb I think that https://github.com/microsoft/LightGBM/pull/6651 will fix the first issue you are quoting: the min_data_in_bin one. Reason is it's being raised because the tag to not check parameters defined outside of BaseClassifier subclass constructor is missing because of not using the new __sklearn_tags__ API, note that one the tags is an xfail bypass: "check_no_attributes_set_in_init": "scikit-learn incorrectly asserts that private attributes "

edit: and likewise for LGBMClassifier doesn't check for NaN and inf in fit. with the allow_nan flag.

jameslamb commented 2 months ago

it's being raised because the tag to not check parameters defined outside of BaseClassifier subclass constructor is missing because of not using the new __sklearn_tags__ API

Ah yep, you are right! I agree with your analysis, thank you.