Reuse Term queries among features

o19s / elasticsearch-learning-to-rank

Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

http://opensourceconnections.com/blog/2017/02/14/elasticsearch-learning-to-rank/

Apache License 2.0

1.48k stars 370 forks source link

Reuse Term queries among features #395

Open SantaDiver opened 2 years ago

SantaDiver commented 2 years ago

There is a very common use case of LTR plugin when somebody adds several features which may use common parts. For example somebody wants to have feature matching specific document field (which could be done with match query) and other feature matching multiple document fields (with _multimatch query for example). In this case we have to traverse posting list for specific field and term multiple times (and also initialize corresponding data structures).

It is also known that on Elasticsearch level match and other similar queries are just combination of Term queries. The idea is: maybe we can somehow reuse Term queries on advance phase? I had a look at sources and it seems this solution demands rewriting elastics queries (or at least inheriting it and rewriting QueryBuilder along with advance).

Do you have any idea on how we can achieve this without rewriting all elastic queries? Or maybe there is other way to increase features calculation performance?

worleydl commented 2 years ago

I feel like there was some ideation around this a while back that never turned into anything. I think it would require some sort of top level query wrapper that would keep the scores around and link them up to features but nothing was ever fleshed out around that. Such a wrapper could utilize the features themselves in the top level matching query, then keep scores around for the rescore phase.

We certainly welcome ideas around the topic but I don't know if anyone has been thinking about it recently. Maybe @nomoa? I should be digging into this project a little more next week to get us up to date on ES 7.15 and I'll see if I can find the previous discussion around performance optimizations.

worleydl commented 2 years ago

Found the previous issue here: https://github.com/o19s/elasticsearch-learning-to-rank/issues/11, there was a WIP branch that's mentioned there.

SantaDiver commented 2 years ago

@worleydl thank you for the answer! Reusing query score at rescore phase sounds great but what I am really talking about is reuse of score between features in featureset itself. For example somebody wants to use featureset like this:

[
    {
        "name": "feature1",
        "params": ["query"],
        "template_language": "mustache",
        "template": {
            "multi_match": {
                "query": "{{query}}",
                "fields": ["field1", "field2"],
                "type": "most_fields"
            }
        }
    },
    {
        "name": "feature2",
        "params": ["query"],
        "template_language": "mustache",
        "template": {
            "multi_match": {
                "query": "{{query}}",
                "fields": ["field1", "field2"],
                "type": "best_fields"
            }
        }
    }
]

This two features differs only by the type of term-field scores aggregation. What I think is maybe we can reuse intermediate results instead of calculating the same thing twice.