microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.56k stars 3.82k forks source link

LGBMRanker does not train for custom ranking objectives #5815

Closed TrevorWinstral closed 1 year ago

TrevorWinstral commented 1 year ago

Description

When training ranking models, LGBMRanker does not actually train. I came across this while implementing a version of LambdaMART, however in the minimum working example, I have simply used a regression objective as this reproduces the bug and is more understandable. The key is that the predictions remain 0 throughout training (as we can observe with the print(pred) line of the objective. We can also see that when using the predict() method after fitting, a vector of 0s is returned. Furthermore, when fitting with the built-in 'lambdrank' objective, the predictions do change after training.

Reproducible example

Here I define a custom objective, following the format designated here in the documentation. Next I generate random ranks for 5 documents across 20 queries, I then instantiate random features for each query document pair, and finally the group parameter designating the size of each group (i.e documents per query).

import lightgbm
import numpy as np

def custom_objective(true, pred, groups):
    print(np.unique(pred))
    grad = pred - true
    hess = -1 * np.ones_like(grad)
    return grad, hess

true = np.random.uniform(0, 1, size=(20, 5)).argsort().argsort().flatten()
X = np.random.uniform(0, 1, size=(true.shape[0], 8))
groups = np.ones(20) * 5

params = {"objective": custom_objective, "n_estimators": 10}
ranker = lightgbm.LGBMRanker(**params)
ranker.fit(X, true, group=groups)
print(ranker.predict(X))
# array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
#        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

Environment info

LightGBM 3.3.5

Command(s) you used to install LightGBM

pip install lightgbm

Running on PopOS, based on Ubuntu 22.04.

jameslamb commented 1 year ago

Thanks for taking the time to open this and for using LightGBM.

But please don't open issues in open source projects and then close them without describing the resolution! Others with the same question are going to find this from search engines, and you could have helped them by writing down what you learned.

I suspect you ran into this common issue with LightGBM... to get LightGBM to train on very small datasets, you have to modify settings for its overfitting protections like min_data_in_bin. See the links in https://github.com/microsoft/LightGBM/issues/5493#issuecomment-1255046937.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.