opensearch-project / opensearch-learning-to-rank-base

Fork of https://github.com/o19s/elasticsearch-learning-to-rank to work with OpenSearch
Apache License 2.0
14 stars 11 forks source link

LTR plugin digest #26

Open noCharger opened 7 months ago

noCharger commented 7 months ago

Workflow

Screenshot 2023-11-15 at 8 50 37 AM

Core mapping: Grade (from judgment) - Features (feature name 1, feature name 2, ...) - document identifier

Sequence Diagram

Screenshot 2023-11-15 at 8 46 33 AM

Step 1: Create ltr index

ltr index conatains metadata about features and models

curl -X PUT "localhost:9200/_ltr"
{"acknowledged":true,"shards_acknowledged":true,"index":".ltrstore"}%

Step 2: Create feature set

Features are templated OpenSearch Queries. Users can select and experiment with features.

A feature set is a list of features (with unique names) that has been grouped together for logging & model evaluation.

POST _ltr/_featureset/more_movie_features
{
   "featureset": {
        "features": [
            {
                "name": "title_query",
                "params": [
                    "keywords"
                ],
                "template_language": "mustache",
                "template": {
                    "match": {
                        "title": "{{keywords}}"
                    }
                }
            },
            {
                "name": "title_query_boost",
                "params": [
                    "some_multiplier"
                ],
                "template_language": "derived_expression",
                "template": "title_query * some_multiplier"
            },
            {
                "name": "custom_title_query_boost",
                "params": [
                    "some_multiplier"
                ],
                "template_language": "script_feature",
                "template": {
                    "lang": "painless",
                    "source": "params.feature_vector.get('title_query') * (long)params.some_multiplier",
                    "params": {
                        "some_multiplier": "some_multiplier"
                    }
                }
            }
        ]
   }
}

Step 3: Logging feature values with docs

POST tmdb/_search
{
    "query": {
        "bool": {
            "filter": [
                {
                    "terms": {
                        "_id": ["7555", "1370", "1369"]
                    }
                },
                {
                    "sltr": {
                        "_name": "logged_featureset",
                        "featureset": "more_movie_features",
                        "params": {
                            "keywords": "rambo"
                        }
                }}
            ]
        }
    },
    "ext": {
        "ltr_log": {
            "log_specs": {
                "name": "log_entry1",
                "named_query": "logged_featureset"
            }
        }
    }
}
  1. The SLTR query is rewritten into a ranker query, which has a list of disjunct queries, each of which is rewritten from features (Query Phase).
  2. Use a named query (_name) to label all docs the SLTR query matched (MatchedQueriesPhase)
  3. Ranker query has a HitLogConsumer to log features (feature name, score as the value) by append DocumentField on each SearchHit (LoggingFetchSubPhase)
Screenshot 2023-11-29 at 9 30 18 AM Screenshot 2023-11-29 at 9 30 58 AM
        public void process(HitContext hitContext) throws IOException {
            if (scorer != null && scorer.iterator().advance(hitContext.docId()) == hitContext.docId()) {
                loggers.forEach((l) -> l.nextDoc(hitContext.hit()));
                // Scoring will trigger log collection
                scorer.score();
            }
        }
        void nextDoc(SearchHit hit) {
            DocumentField logs = hit.getFields().get(FIELD_NAME);
            if (logs == null) {
                logs = newLogField();
                hit.setDocumentField(FIELD_NAME, logs);
            }
            Map<String, List<Map<String, Object>>> entries = logs.getValue();
            rebuild();
            currentHit = hit;
            entries.put(name, currentLog);
        }

Logs in search response

"fields": {
          "_ltrlog": [
            {
              "log_entry1": [
                {
                  "name": "1",
                  "value": 0.25069216
                },
                {
                  "name": "2",
                  "value": 0.226041
                }
              ]
            }
          ]
        },

Search with models

Rescore Phase with SLTR Query: In the rescore phase, you apply the SLTR model to rerank the top documents returned by the query phase based on the features defined in your learning-to-rank model.

msfroh commented 7 months ago

Should this be part of a plugin README? Maybe contribute to the doc website once we launch?

macohen commented 7 months ago

Generally, I see the "how to use it" documentation in the doc website. The how to build it/details of how it works should go into the repo as a README (great idea). The sequence diagram should be part of the README along with details about the code. Requests and responses should go into the docs. Also, even if we just have a self-install plugin, but it works we can add this to the documentation site.

@noCharger do you want to make an attempt at this separation when you get a chance? BTW, nice job on the diagram. Maybe good for a review in an upcoming public search relevance meeting.

cc: @epugh for any feedback...