Open metpavel opened 10 months ago
FYI, @shiyu1994, @jameslamb.
exciting! I've added this to #2302 along with other features.
I have 2 questions:
objective
or some other configuration parameter, instead of changing the default approach that's used by e.g. objective = "lambdarank"
. Is that even worth doing?Hi @jameslamb,
objective = "lambdarank"
. For the pairwise scoring, a user would be able to specify something like objective = "lambdarank_pw"
. Or, alternatively, instead of adding a new objective we could introduce a configuration parameter pairwise_score
which is set to false by default.Linking #6182.
Summary
Introduce an alternative for the ranking task (powered by LambdaMART algorithm) to leverage pairwise scoring function (instead of the standard pointwise one).
Motivation
We expect to increase the performance of ranking models trained with LightGBM in most cases.
Description
LambdaMART (the learning-to-rank algorithm implemented in LightGBM) is often referred to as pairwise due to its loss function defined over pairs of documents from the ranked list. However, this pairwise loss function leverages pointwise scoring function s(x) defined over feature vectors of individual documents. Here we propose to extend LambdaMART to use pairwise scoring function s(x_i, x_j) defined on any pair of documents whose feature vectors are x_i and x_j. The advantage of using pairwise scoring function comes when we let it, additionally, use the differential (delta) features: x_i - x_j. That is, we learn a pairwise scoring function s(x_i, x_j, x_i - x_j). The use of delta features can often help ranking documents accurately by a much smaller model: imagine we have a document recency feature which is highly correlated with relevance. To rank documents effectively by this feature with a pointwise scoring function s(x) would require building a regression tree/ensemble with many splits corresponding to as many levels of recency as possible. On the other hand, when we use delta features (i.e., delta recency in this case), having only one split could be enough to predict relative relevance for a pair of documents, and rank them accurately. As a result, we have smaller and more accurate models with better generalization properties.
References