Open jhinch-at-atlassian-com opened 1 year ago
We need to better understand how the sltr
query is implemented. We have only just begun to explore the LTR plugin.
@jhinch-at-atlassian-com -- do you have any ideas of how sltr
is implemented under the hood to help us get started?
@noCharger -- Can you look into this? Would be a good place to get started on understanding the plugin. Thanks!
The best place to start looking is from RankerQuery.RankerWeight#scorer and RankerQuery.DisjunctionDISI#advance. You would need to compare this to how the equivalent functionality in bool
query works. Likely what would need to be done to make it work is to inspect the subIteratorsPriorityQueue
when advance
is called and consider how many sub iterators are at the next doc ID allowing it to skip over scoring documents which don't match.
@jhinch-at-atlassian-com I like this plan and the approach we're taking to support minimum_should_match
. Would you like to contribute?
@jhinch-at-atlassian-com
1) Regarding your alternative: I guess the option of combining a filter and a should is only suitable, if minimum_should_match is supposed to be 1.
2) What is the expected value of this issue? Why is this better than running a normal query (bool with should + minimum_should_match) in combination with a rescoring over the full hits? Leaner, faster, ...?
Is your feature request related to a problem?
Non-linear scoring functions, particularly gradient boost decisions trees can be used as a technique used to deal with combining scores together for features which have different magnitudes and score distributions. However, currently
sltr
queries functions similar tobool
query with aminimum_should_match
of0
with a custom scoring function meaning it cannot be used conveniently within the initial query and currently is encouraged to only be used in rescore blocks.For example given the following featureset definition:
and a model
example_model
which was created using the above featureset, the followingsltr
query:Can be thought conceptually as:
What solution would you like?
It would be great if the features used by the model could have a requirement of a minimum which should match so that the
sltr
:which would translates to roughly the following:
What alternatives have you considered?
Its possible to work around this by having a surrounding
bool
query and duplicate the features as filters in thatbool
query:However this has the problem that it executes the query blocks twice and it requires duplicating the definitions and ensuring the featureset and query remain in sync.
Do you have any additional context?
This is the equivalent feature request as https://github.com/o19s/elasticsearch-learning-to-rank/issues/476 but to the OpenSearch fork.