HenryL27 commented 1 year ago

Problem statement

Addresses #248

Reranking the top search results with a cross-encoder has been shown to improve search relevance rather dramatically. We’d like to do that. Furthermore, we’d like to do that inside of OpenSearch, for a couple reasons: 1/ it belongs there - it’s a technique to make your search engine search better, and 2/ it needs to precede RAG to integrate with it - the retrieval that augments the generation needs to be as good as possible - and succeed the initial retrieval, obviously - so it should be in OpenSearch.

Goals

Users will be able to create reranking search pipelines in OpenSearch to rerank search results using a cross encoder.
Users will be able to upload, deploy, and predict with cross-encoder models with ml-commons (https://github.com/opensearch-project/ml-commons/issues/1164)

Non-goals

Query plans
recursive reranking
remote LLM-based reranking
prompt engineer your cross-encoder

Proposed solution

Reranking will be implemented as a search response processor, similar to RAG. Cross-Encoders will be introduced into ml-commons to support this.

Architecture / Rerank Search Path

Rest APIs

Create Rerank Pipeline

PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "ml_opensearch": {
          "model_id": id of TEXT_SIMILARITY model [required]
        },
        "context": {
          "document_fields": [ "title", "text_representation", ...]
        }
      }
    }
  ]
}

"ml_opensearch" refers to the kind of rerank processor. "model_id" should be the id of the text_similarity model in ml-commons "context" tells the pipeline how to construct the context it needs in order to rerank "document_fields" are a list of fields of the document (in _source or fields) to rerank based on. Multiple fields will be concatenated as strings.

Query Rerank Pipeline

Provide to the search pipeline as a search ext the params for the reranker. Use either "query_text", which acts as the direct text to compare all the docs against, or "query_text_path", which is an xpath that points to another location in the query object.

POST index/_search?search_pipeline=rerank_pipeline
{
  "query": {...}
  "ext": {
    "rerank": {
      "query_context": {
         "query_text | query_text_path": <the query text to use for reranking | 
                                         the path to the query text to use for reranking>
      }
    }
  }
}

For example, with a neural query we might have

"query_text_path": "query.neural.embedding.query_text"

The rerank processor will evaluate the all search results, and then sort them based on the new scores.

Upload Cross Encoder Model

POST /_plugins/_ml/models/_upload
{
  "name": "model name" [required],
  "version": "1.0.0 or something" [required],
  "description": "description" [required],
  "model_format": "TORCH_SCRIPT" [required],
  "function_name": "TEXT_SIMILARITY" [required],
  "model_content_hash_value": "hash browns" [required],
  "url": "https://url-of-model" [required]
}

This is not a new API and all the other model-based APIs should still work for the cross encoder model/function name with minimal work to integrate.

Predict with Cross Encoder Model

See the Cross-Encoder PR

Risks

Query Latency. Cross-encoder latency can be large - one inference is about the same cost as an embedding inference, but every rerank query will make k inferences - depending on hardware, batch size, gpu size, etc., especially if you’re trying to rerank a lot of documents

Implementation Details

The overall reranking flow will be:

Generate a context object used to rerank documents
Use that context to rescore them
Sort the documents according to the new scores

We will implement two main base classes for this work: RerankProcessor and ContextSourceFetcher.

ContextSourceFetcher This will retrieve the context needed to rerank documents. Essentially, step 1. A particular rerank processor may make use of several of these, and they can get their context from any source.

RerankProcessor Orchestrates the flow by combining all the context from the ContextSourceFetchers, then generates scores for the documents via an abstract score method, then does the sorting.

Extensibility

It is my hope that these interfaces are simple enough to extend and configure that we can create a rich ecosystem of rerank processors. To implement the cross-encoder reranker, all I need to do is create a NlpComparisonReranker subclass that says "score things with ml-commons", a DocumentContextSourceFetcher subclass that retrieves fields from documents, and a QueryContextSourceFetcher that retrieves context from the query ext.

If I wanted to implement the Amazon Personalize reranker of the search-processors repo, I would implement an AmazonPersonalizeSourceContextFetcher and an AmazonPersonalizeReranker, which only have to do the minimal amount of work to make the logic functional.

I also think is should be possible incorporate some of the work from the Score Normalization and Combination feature, but that's outside the scope of this RFC.

Alternative solutions

Rerank Query Type

Another option is to implement some kind of rerank query. This would wrap another query and rerank it. For example

POST index/_search
{
  "query": {
    "rerank": {
      "query": {
          "neural": {
            "embedding": {
              "query_text": "Oh where is my hairbrush",
              "k": 100,
              "model_id": "embedding model id"
            }
        }
      },
      "top_k": 25,
      "model_id": "reranker model id",
      "context_field": "text_representation",
      "query_text": "Oh where is my hairbrush"
    }
  }
}

Pros:

Abstracts away the search pipeline layer
Potentially allows composing reranks in other kinds of boolean and hybrid queries

Cons:

Rerank should be a post-search step. Allowing composition with other kinds of queries defeats the purpose of reranking
Rerank probably needs to be global - after the fetch phase
I personally think this api is kinda messy. Can probably be improved, but I’m not sure how

dylan-tong-aws commented 1 year ago

Hi Henry,

Thanks, for putting this together. I have a few questions...

Can you clarify how you propose to supporting pre and post request format processing in this pipeline? Is this built into the pipeline or did you envision this to be part of the connector? It would be great to have a search processor that provides an easy way to configure JSON-to-JSON transforms to simplify the effort with integrating with various downstream APIs and models.
What controls does the user have around configuring how results are sent to the re-ranker? Let say the re-ranker isn't a managed API and it's hosted on a model server--are you proposing any controls like the ability to send results as async mini-batches and performing post processing like merge and sort?
What controls does the user have with configuring what data get's sent to the re-ranker model? There are slight variations in re-ranking use cases in terms of what inputs are passed to the re-ranker model. In some cases, it's just the search results. Other use cases require the query context.

HenryL27 commented 1 year ago

Thanks @dylan-tong-aws. I have a few responses!

My mental model is that JSON-to-JSON search-result transforms should belong in their own response processor. This processor will look for a source field (specified at pipeline creation) and package that off to the reranker with the query_text. If you want your context text to look in a certain way, throw a processor before the reranker processor that performs your transformation.
I'm currently adding ml-commons support for cross-encoder (text similarity) models; that entails a new kind of MLInput that contains a list of text pairs that the model will evaluate the relevance between. What this processor will do (sorry if that was unclear from the RFC) is use this interface to re-score the top k search results, and then re-sort them inside the processor itself. I understand that Cohere has a rerank API? I'm not using it by default here. But hopefully the ml-commons text similarity interface I'm building will integrate via connector with that.
Cross encoders (the case I'm trying to support here) always require the search results and the query. See point 1 for what I think about customizing what the document text looks like. As for the query text, the user specifies it, so that should be sufficiently controllable. This work does not intend to support reranking with only the documents, nor reranking based on user context from other sources. I'm not convinced that there's a general rerank interface sufficiently different from the response processor interface to justify such a thing existing, so this is not trying to do that.

This is a narrow use-case. Just take all your docs and ask a (text-to-float) language model how similar they are. Then sort based off of that. Nonetheless, this alone can give like a 15-20% boost to recall in the top couple, so I think it's worth knocking out.

p.s. Ok, I read up on the cohere rerank api and it should be able to connect to this work more readily than without it

navneet1v commented 1 year ago

Hi, @HenryL27 thanks for creating the RFC. I have some suggestions and comments:

In all the new APIs Inputs that you have provided can you please add what the required and what are not required parameters.
Why do we need a top_k parameter in processor and in ext.
When the documents are re-ranked what will happen to the scores of the documents?
context_field in the response processor is not fitting what the actual value will be. Please rename.
When working with vector search, recommendation is not get _source fetch vector field adds latency. Rather it is advised to fetch the fields which are only required, how re-ranking will work in that case?
The risk that has been added as part of RFC seems like a big risk, do you have any proposal on how latencies can be reduced?
Will the cross encoder model be a local model or remote model?
As part of RFC can you add some open source cross encoder models?
Can you please provide some understanding why we need python extensions for cross encoder model? Given that Opensearch with ML commons provide a support for models, why we need this?
For Query Rerank Pipeline I see two options please add what is the recommended solution there.

HenryL27 commented 1 year ago

@navneet1v thanks!

done
I'm not sure we need it in both places. In most cases you'll probably just set it in the processor and forget about it. But I thought that maybe if you know you need to rerank a lot of things for a particular query (or you only need to rerank a few things for a particular query) it would be nice if you had an override switch.
The scores will be overwritten with the new scores from the cross encoder. Yes, this throws out any previous normalization, and yes, if you only rerank some of the documents you can get weird inconsistencies. I can normalize the cross-encoder scores maybe? I'm not sure it's worth it though. If you have a good idea of what the behavior should be I'm all ears
I'm not sure I understand what you mean by this. context_field tells the processor what text to send from each document as context against which to compare the query. A cross encoder takes (query, context) pairs. So we need to specify what the query and contexts are. What did you have in mind?
You wouldn't rerank based on a vector; it only makes sense to rerank based on semantically meaningful text. Presumably, that semantically meaningful text that you're reranking is also the semantically meaningful text that you care about as a search user; though I recognize that that's an assumption. If you don't fetch the context field, then the reranking processor should either do nothing or error out, since there's nothing to rerank.
This is the purpose of top_k (and maybe it should be required bc of this?) - all we can really do is force the user to think about it. If the user tries to rerank 10,000 documents, that's kinda on them, ya know? We can optimize the hell out of this code, but the performance bottleneck is the cross encoder model and there's really not much we can do about that besides make it clear that this can be an issue.
Currently building cross encoders as local, but it should be possible to Connect to the Cohere endpoint as well (does anyone else offer a rerank API?)
Sure thing. For my own testing I've been playing with the BGE reranker. Once I have cross-encoder support in ml-commons I'll at least publish some upload recipes, and try to put something in the pretrained model archive.
Hah! No, we don't need python extensions. Someone asked me if we could use python extensions for this, and that section of the rfc is me saying "we could, but we really shouldn't."
I think the best option is option 1 (fully write your query text). I included option 2 because people have gotten indignant in the past when we ask them to rewrite their query text. In theory, we could support both options (make it like an either/or - you must have a query_text xor a query_text_path), but that may make the API unnecessarily overcomplicated. Most queries will be constructed by code anyway, where putting a variable in two places instead of one is trivial.

vamshin commented 1 year ago

thanks @HenryL27. Few comments/questions

Looks like this RFC focus only on local models support for cross encoder reranking. To be a complete solution, can we also incorporate supporting remote models? To me supporting both local and remote models should be a goal.
rescoring only subset of results definitely leads to inconsistencies as you called out. This should be taken part of the solution. May be you can give lowest possible score to non competing docs? Lets take a goal of making reranking processor leave results consistent.
We seem to name processor neural_rerank. While this name looks like supporting generic reranker, RFC focuses only on cross encoder based reranking. Are we sure if current interface can support different techniques in future? If not, should we call something like neural_crossencoder_rerank to avoid backward compatibility issues as we try to make it more generic? I am not a fan of creating processor for every use case, would rather prefer evaluating current approach to make it more generic.

HenryL27 commented 1 year ago

@vamshin thanks

Yep, I'll work on making it compatible with Cohere's cross-encoder API. I don't know of any other remote APIs that do reranking but I think they'll all look relatively similar. May require some Connector finagling but otherwise this should work.
Yeah, I think spreading the lowest score to the other docs (or maybe minus a delta?) is probably the behavior we want. Another option I considered was introducing the rerank-score as another search hit field altogether, maybe _rescore so we don't override the original _score value, wdyt?

How about

PUT /_search/rerank_pipeline
{
"response_processors": [
{
  "rerank": {
    "cross-encoder": {
      "top_k": int (how many to rerank) [optional],
      "model_id": id of cross-encoder [required],
      "context_field": str (source field to compare to query) [required]
    }
  }
}
]
}

Implementation-wise I think this becomes a single "rerank" processor and depending on the type ("cross-encoder" here) it casts itself to whatever it needs to be or something

navneet1v commented 1 year ago

@HenryL27

I'm not sure we need it in both places. In most cases you'll probably just set it in the processor and forget about it. But I thought that maybe if you know you need to rerank a lot of things for a particular query (or you only need to rerank a few things for a particular query) it would be nice if you had an override switch.

If a user needs to do this, he can add the processor in the search request itself, rather than this, so I would not provide multiple overrides.

The scores will be overwritten with the new scores from the cross encoder. Yes, this throws out any previous normalization, and yes, if you only rerank some of the documents you can get weird inconsistencies. I can normalize the cross-encoder scores maybe? I'm not sure it's worth it though. If you have a good idea of what the behavior should be I'm all ears

As this inconsistencies are arriving because of topk parameter, I would not even put that parameter in the processor. So if user is getting X documents from opensearch, we should re-rank all of them.

I'm not sure I understand what you mean by this. context_field tells the processor what text to send from each document as context against which to compare the query. A cross encoder takes (query, context) pairs. So we need to specify what the query and contexts are. What did you have in mind?

The main point in this was the name context_field is very generic. Lets rename this.

You wouldn't rerank based on a vector; it only makes sense to rerank based on semantically meaningful text. Presumably, that semantically meaningful text that you're reranking is also the semantically meaningful text that you care about as a search user; though I recognize that that's an assumption. If you don't fetch the context field, then the reranking processor should either do nothing or error out, since there's nothing to rerank.

I am not saying re-rank based on vector field. When doing query customer may put _source as false(which is very standard usecase for vector search) and make fields:['title', 'description'] etc. In that case _source will be empty, but there will be an array of fields in the response. So we should not just rely on the _source.

This is the purpose of top_k (and maybe it should be required bc of this?) - all we can really do is force the user to think about it. If the user tries to rerank 10,000 documents, that's kinda on them, ya know? We can optimize the hell out of this code, but the performance bottleneck is the cross encoder model and there's really not much we can do about that besides make it clear that this can be an issue.

My recommendation for this would be that these re-rankers model should run outside of OpenSearch cluster like remote models, where users can use GPU based instances for doing re-ranking. The reason is if the latency for re-ranking is in 100 of ms for like 100 records, then the feature become unusable.

Currently building cross encoders as local, but it should be possible to Connect to the Cohere endpoint as well (does anyone else offer a rerank API?)

we should explore this more. May be our local models deployed in some other services like Sagemakers etc, and not specifically cohere.

HenryL27 commented 1 year ago

@navneet1v

If a user needs to do this, he can add the processor in the search request itself, rather than this, so I would not provide multiple overrides.

True, this is possible. But in a case where I already have a rather complicated search pipeline I might not want to rewrite it all, and I'm not sure that saying "if you want to use a different value for top_k then rewrite your processor" actually makes the API cleaner. Maybe it can just be a required param of ext and leave it out of the processor definition entirely?

As this inconsistencies are arriving because of topk parameter, I would not even put that parameter in the processor. So if user is getting X documents from opensearch, we should re-rank all of them.

Maybe. I guess the assumption with reranking is generally that anything not in the top k is irrelevant and therefore doesn't need to be reranked; as such why return it in the first place? That said, you might want to see things that are not in the top k - in high-latency cases where reranking is constrained, or in testing/tuning cases where you want to see what the reranker doesn't get to see to troubleshoot. We could also take @vamshin's suggestion and fix the inconsistencies by rescoring the docs outside of the top k or something.

The main point in this was the name context_field is very generic. Lets rename this.

Gotcha. Do you have a suggestion? I'm following what @austintlee did in the RAG processor.

I am not saying re-rank based on vector field. When doing query customer may put _source as false(which is very standard usecase for vector search) and make fields:['title', 'description'] etc. In that case _source will be empty, but there will be an array of fields in the response. So we should not just rely on the _source.

Huh, I didn't know you could do this! I guess then we'll look for the field in the "fields" array then? Those fields are still (key, value) pairs, right? Should be easy to look at both then.

Regarding remote models, yep, I'm looking into that. Fundamentally the Connector interface should handle all the juicy api discrepancies between remote models. I'll make sure that the text-similarity model type I'm adding to ml-commons can talk to connectors and then we should be all good, right?

navneet1v commented 1 year ago

We could also take @vamshin's suggestion and fix the inconsistencies by rescoring the docs outside of the top k or something.

I don't completely agree on this thought. We should be consistent either by saying we are going to re-rank k documents and return only K, or we are going to re-rank all the documents and return all of them. Being in the middle state is bad. One way to solve this is by OverSampling processor. Customer asked for X, we retrieved let say 2 * X, re-ranked all 2X documents and returned X documents back.

Gotcha. Do you have a suggestion? I'm following what @austintlee did in the RAG processor.

Re-ranking field is one name that come to my mind.

Huh, I didn't know you could do this! I guess then we'll look for the field in the "fields" array then? Those fields are still (key, value) pairs, right? Should be easy to look at both then.

Yes they are key value pairs. But what I meant to say was, we need to handle this use case too, because doing queries like this, provides latency boost.

I'll make sure that the text-similarity model type I'm adding to ml-commons can talk to connectors and then we should be all good, right?

Yeah may be. Will wait for that to be out.

dylan-tong-aws commented 1 year ago

Yep, I'll work on making it compatible with Cohere's cross-encoder API. I don't know of any other remote APIs that do reranking but I think they'll all look relatively similar. May require some Connector finagling but otherwise this should work.

@HenryL27, another scenario we're looking to support is for custom re-rank models that are hosted on an external model server like an Amazon SageMaker endpoint. The search pipeline will require more flexibility than what it takes to integrate with a managed API. The hosted model may simply be a classification/regression model that's trained for a re-ranking task (eg. XGBoost).

The gist is that we'll need some flexibility around how the data transfer and protocol:

Data transform (request/response): We need the ability to perform a request and response data transform within the search pipeline or a model serving script on the external endpoint. The latter option is covered by the user. What you suggest around a generic JSON-to-JSON request processor would make it easier to package all the required functionality within an OpenSearch query workflow.
Data exchange: Unlike the managed API, which may have more sophisticated functionality built into the API, a hosted model may require multiple inference calls per query. The hosted model is likely suited to score a mini batch of search results at a time, so multiple async mini-batches might have to be performed to score results when "k" size reaches a point. We need to do further research to determine the importance of more sophisticated scenarios. LTR, for instance, does shard level re-ranking. We need to evaluate how critical this is whether re-ranking as a post process suffices. We also want to investigate how to best integrate LTR ranking models into this pipeline.

dylan-tong-aws commented 1 year ago

We should be consistent either by saying we are going to re-rank k documents and return only K, or we are going to re-rank all the documents and return all of them.

This should suffice. It's generally how second stage re-ranking works, and what we were planning to support. I can re-validate this with our customers--I didn't receive requirements to normalize the re-scored results with results that weren't re-scored.

vamshin commented 1 year ago

@HenryL27

Yeah, I think spreading the lowest score to the other docs (or maybe minus a delta?) is probably the behavior we want. Another option I considered was introducing the rerank-score as another search hit field altogether, maybe _rescore so we don't override the original _score value, wdyt?

I like this idea. It is also easier to debug when we have both scores. Only concern/question I have is, it might impact customers using existing OpenSearch clients which do not know about rescore fields ? We may need to validate this

PUT /_search/rerank_pipeline { "response_processors": [ { "rerank": { "cross-encoder": { "top_k": int (how many to rerank) [optional], "model_id": id of cross-encoder [required], "context_field": str (source field to compare to query) [required] } } } ] }

This LGTM! This can be the direction if rerankers cannot be generic

dylan-tong-aws commented 1 year ago

I like this idea. It is also easier to debug when we have both scores. Only concern/question I have is, it might impact customers using existing OpenSearch clients which do not know about rescore fields ? We may need to validate this

@vamshin, what are your thoughts on @navneet1v about combining re-rank and original scores? I am not aware of use cases that require some sort of way to normalize and combine scores. As far as I know, customers just expect to re-rank "k" results or all default to all the results retrieved by the initial retrieval. There's no need for anything fancy.

navneet1v commented 1 year ago

@vamshin, what are your thoughts on @navneet1v about combining re-rank and original scores?

@dylan-tong-aws I am not sure if you understood what I was trying to say but its not definitely this.

What I am trying to say is lets say if a customer goes ahead and retrieve 100 results, and we re-rank only first 50, then the score of first 50 documents and the later ones will not be consistent. We should be consistent in our results scores.

dylan-tong-aws commented 1 year ago

@navneet1v, right, so if a user says return K number of re-score results, it just returns K results even if the first-stage retrieval had N > K results. It's my understanding that the proposal is to return N results and find a way to normalize the K re-score results so they are consistent. I am agreement that we can just return the K results. As far as I know, this sufficiently delivers on customer requirements.

HenryL27 commented 1 year ago

Ok, consensus on the top k issue; can I get thumbs up?

We will simply rerank every search result that goes through the processor.

Top K is removed entirely. If you request 5000000 documents through a rerank processor and it kills your reranker, that's on you. (Doing this to a embedding ingest processor can also OOM your system, so I think that's okay)

navneet1v commented 1 year ago

Yes I am aligned with removing topK. If for any other reason a customer want to fetch more results and just want to re-rank few results, they can us Oversampling processor, as mentioned here in my previous comment. (https://github.com/opensearch-project/neural-search/issues/485#issuecomment-1796423711)

dylan-tong-aws commented 1 year ago

Ok, consensus on the top k issue; can I get thumbs up?

We will simply rerank every search result that goes through the processor.

Top K is removed entirely. If you request 5000000 documents through a rerank processor and it kills your reranker, that's on you. (Doing this to a embedding ingest processor can also OOM your system, so I think that's okay)

This implementation still honors the search result limit safe guards and query timeout settings, correct?

HenryL27 commented 1 year ago

I think so? That's more a question for Froh I think

vamshin commented 1 year ago

Aligned on removing topK to keep consistent results. Also this is not a one way door decision. If use cases arise to expose such params we can always revisit.

HenryL27 commented 1 year ago

@vamshin @navneet1v Alright, here's a rough sketch of my low-level plan for generality here: We'll introduce an interface called RerankProcessor that extends SearchResponseProcessor or something. We'll implement a RerankProcessorFactory that will construct various implementations of this interface as we come up with them. The interface (maybe actually ABC, idk) will have 3 important methods:

abstract score(SearchResults, ScoringContext) - scores all search results given a context
rerank(SearchResults, ScoringContext) - reranks the search results given a context
abstract generateScoringContext(SearchResults, SearchQuery) - generate the context that the prev. two use

The default rerank implementation will simply call score to get new scores, replace them in the search results, and then re-sort the search results. The default processSearchResults implementation will first generateScoringContext and then rerank. ScoringContext is just gonna be like a <String, Object> map.

I think this should allow any implementable reranking processor to be implemented cleanly, and will align nicely with the new PUT /_search/pipeline API.

In the case of the cross encoder reranker the score method will call an ml-commons cross-encoder, and the generateScoringContext method will find the query_text field in the search query.

HenryL27 commented 12 months ago

@navneet1v I ran a query where I asked for stuff as fields and it came back and told me not to do that as it would cost performance. I'd have to turn it into some kind of re-invertible inverted index or something? It seems to want to use fields for keyword fields, whereas typically for reranking you'll want to rerank based off of a more fuzzy similarity field, right? idk, I feel like I'm not understanding something. Is the fielddata=true index shenaniganery more effificant that source anyway? I'm also not convinced that this would ever be the bottleneck... idk, it's easy to implement so I will/did, it's just not quite adding up for me

navneet1v commented 11 months ago

Hi @HenryL27, Here are some of my thoughts on the interface and the Low level design:

We should create a single processor called as rerank with the following interface:

PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "model_id": id of the model used for re-ranking can be local or remote [required],
        "context": { 
           "ranking_source_fields": ["title", "title_and_description" ..] (required) // list of field values per document that needs to be passed to re-ranking model as a single string.
           .... // other fields that can come in future which can be part of context.
        }
      }
    }
  ]
}

From the _search api the api should look like this:

POST index/_search?search_pipeline=rerank_pipeline
{
  "query": {...}
  "ext": {
    "rerank": {
      "query_context": {
         "query_string": "", // optional
         "path": "" optional str (path in the search body to the query text) [required],
      }
    }
  }
}

The reason why I am thinking to have a query_context object and 2 other fields inside it to make sure that user has a capability to provide the query for re-ranking in 2 forms: a. path : User want to use the re-ranking query string from the actual query. b. query_string: User can fill this string for complex re-ranking query like what was population of USA in 2003? This type of query might get represent in the actual query clause hence providing "query_string", can help bridge the gap. User can only set either query_string or path.

With this, I can see changes in low level plans which you have added. Please go ahead and update the low level plan. cc: @vamshin , @dylan-tong-aws

@navneet1v I ran a query where I asked for stuff as fields and it came back and told me not to do that as it would cost performance.

Can you paste what was the response and what your query?

HenryL27 commented 11 months ago

Can you paste what was the response and what your query?

query:

POST testindex/_search?search_pipeline=hybrid_pipeline
{
  "size": 20,
  "_source": ["text_representation", "properties", "type"],
  "docvalue_fields": ["type"], 
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text_representation": "Was abraham lincoln a good president?"
          }
        },
        {
          "neural": {
            "embedding": {
              "query_text": "Was abraham lincoln a good president?",
              "model_id": "5EbxXosBQHd70iP-nKjn",
              "k": 100
            }
          }
        }
      ]
    }
  }
}

response:

"error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [type] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [...

but if I set docvalue_fields to ["type.keyword"] then it's fine with it

navneet1v commented 11 months ago

@HenryL27 Can you try this query?

POST testindex/_search?search_pipeline=hybrid_pipeline
{
  "size": 20,
  "_source": "false",
  "fields": ["text_representation", "properties", "type"],
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text_representation": "Was abraham lincoln a good president?"
          }
        },
        {
          "neural": {
            "embedding": {
              "query_text": "Was abraham lincoln a good president?",
              "model_id": "5EbxXosBQHd70iP-nKjn",
              "k": 100
            }
          }
        }
      ]
    }
  }
}

HenryL27 commented 11 months ago

@navneet1v aha okay thanks. That works.

HenryL27 commented 11 months ago

@navneet1v

PUT /_search/rerank_pipeline { "response_processors": [ { "rerank": { "model_id": id of the model used for re-ranking can be local or remote [required], "context": { "ranking_source_fields": ["title", "title_and_description" ..] (required) // list of field values per document that needs to be passed to re-ranking model as a single string. .... // other fields that can come in future which can be part of context. } } } ] }

I worry that all reranking options might not use a model. Maybe I have a reranker that attempts to fit as many individual documents into a given context window for a future rag step. Arguably that isn't reranking, but you can see there is potential for rerankers that do different things than simply compare a query to a set of fields

For the _search part of the API, do we need query_context layer? The current implementation simply looks for the path xor the text as fields of the rerank object

HenryL27 commented 11 months ago

I worry that all reranking options might not use a model.

Another worry. Models are gonna want to have different contexts. XGBoost, for example, will want a feature vector. Maybe that's constructed ahead of time in a JSON2JSON processor, but I think it would make sense for an XGBoost rerank processor to be configured at pipeline creation to construct such a vector.

Or a model that uses user information to help rerank. We have to tell it where to find that user info, no?

Also what do you want me to do with the list of multiple context.ranking_source_fields? Cross encoders can score the similarity of a pair of strings - what would be the expected behavior in this particular use-case if a user said to rerank on multiple fields?

navneet1v commented 11 months ago

@HenryL27

I worry that all reranking options might not use a model. Maybe I have a reranker that attempts to fit as many individual documents into a given context window for a future rag step. Arguably that isn't reranking, but you can see there is potential for rerankers that do different things than simply compare a query to a set of fields

Here is my understanding, if we are seeing this kind of use-case then we can either have another processor called as Non-ML-Re-ranker or some better name. But I don't see creating re-ranker type based on model types like cross-encoder re-ranker or cohere re-ranker that is too much granularity.

For the _search part of the API, do we need query_context layer? The current implementation simply looks for the path xor the text as fields of the rerank object

I didn't get this question.

Another worry. Models are gonna want to have different contexts. XGBoost, for example, will want a feature vector. Maybe that's constructed ahead of time in a JSON2JSON processor, but I think it would make sense for an XGBoost rerank processor to be configured at pipeline creation to construct such a vector.

This brings up an interesting question, which is who should own the responsibility to creating the feature vector is it Neural Plugin or ML Commons Plugin?

Or a model that uses user information to help rerank. We have to tell it where to find that user info, no?

This is the reason why I was suggesting context option in the API. So that we can add different context source fetcher options like fetch data from a data source or from the _source etc. Currently I suggested only ranking_source_fields because we are implementing different context source like lets say DDB, S3, mogodb etc.

Cross encoders can score the similarity of a pair of strings - what would be the expected behavior in this particular use-case if a user said to rerank on multiple fields?

I see here as 2 options, in general what I saw from cohere is if we want to provide more than 1 field values as context we can concatenate the strings and pass them as 1 string. So we can go with this. Going forward if we need different behavior or different way concatenation we can provide those options in the processor and default behavior can be simple concatination.

HenryL27 commented 11 months ago

@navneet1v

I agree that specifying model type is too much granularity. What about naming the subtype after the ml-algortihm/function name it uses? So this would be text similarity re-ranker or something?

Here I'm just asking why the "query_context" layer can't be left out of this API

POST index/_search?search_pipeline=rerank_pipeline
{
"query": {...}
"ext": {
"rerank": {
  "query_context": {
     "query_string": "", // optional
     "path": "" optional str (path in the search body to the query text) [required],
  }
}
}
}

instead taking something like

POST index/_search?search_pipeline=rerank_pipeline
{
"query": {...}
"ext": {
"rerank": {
   "query_string": "", // optional
   "path": "" optional str (path in the search body to the query text) [required],
}
}
}

My take is that the ml-commons predict API should be a very thin wrapper around models themselves. So ml-commons would take a feature vector (or some kind of MLInput that represents that) and translate it into the form that the model wants. The decisions about what goes in the vector/input belong in neural search.
makes sense, ok
ok, will concat for now. I think there's some work in the pipes for some kind of prompt framework (which is essentially just f-strings) that maybe we can make use of in the future? Although maybe the best model is for that to just be another processor so we don't need to construct such a string at all

navneet1v commented 11 months ago

@HenryL27

Here I'm just asking why the "query_context" layer can't be left out of this API
POST index/_search?search_pipeline=rerank_pipeline
{
"query": {...}
"ext": {
"rerank": {
"query_context": {
"query_string": "", // optional
"path": "" optional str (path in the search body to the query text) [required],
}
}
}
}
This one is better abstraction. I am aligned on this.

Although maybe the best model is for that to just be another processor so we don't need to construct such a string at all

On this I am not that much convinced for creating a processor. But we can leave this decision when the use case arrives. As of now lets go with concatenation. This one will have quite a similarity with the Summarization Processor. There also we might want to summarize multiple fields. So we can think of a common generic way.

My take is that the ml-commons predict API should be a very thin wrapper around models themselves. So ml-commons would take a feature vector (or some kind of MLInput that represents that) and translate it into the form that the model wants. The decisions about what goes in the vector/input belong in neural search.

This is a valid point if we just look from predict api standpoint. I am not sure if going forward predict api will be integrated with Agents framework of MLCommons, if yes then it becomes counter intuitive to say predict api as thin wrapper because then we can build an Re-ranking agent whom we pass search response, model and other information and it make sure that it gives the final re-ranked results.

Again its not a use case for now, so lets keep park it for future.

I agree that specifying model type is too much granularity. What about naming the subtype after the ml-algortihm/function name it uses? So this would be text similarity re-ranker or something?

For this can you put comment the interface which you have in mind.

HenryL27 commented 11 months ago

@navneet1v example text similarity-based rerank API

PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "text_similarity": {
          "model_id": id of TEXT_SIMILARITY model [required],
        },
        "context": {
          "document_fields": [ "title", "text_representation", ...],
          ...
        }
      }
    }
  ]
}

In the future, when there are other function names in ml commons for other kinds of rerank models (or we wanna bypass ml-commons entirely) this is represented in the API

navneet1v commented 11 months ago

@HenryL27

@navneet1v example text similarity-based rerank API
PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "text_similarity": {
          "model_id": id of TEXT_SIMILARITY model [required],
        },
        "context": {
          "document_fields": [ "title", "text_representation", ...],
          ...
        }
      }
    }
  ]
}
In the future, when there are other function names in ml commons for other kinds of rerank models (or we wanna bypass ml-commons entirely) this is represented in the API

I found 2 re-rankers that doesn't use ML models. Please check this: https://github.com/opensearch-project/search-processor/tree/main/amazon-personalize-ranking, https://github.com/opensearch-project/search-processor/tree/main/amazon-kendra-intelligent-ranking

I think we should see how we can merge all these interfaces or the interface that we are building is it extendable enough to support those re-rankers in future.

HenryL27 commented 11 months ago

@navneet1v Just glancing at these, I don't think it will be too difficult. Your interface would look something like (e.g. for personalize)

PUT /_search/rerank_pipeline
{
  "response_processors": [
    {
      "rerank": {
        "amazon_personalize": {
          "campaign": blah,
          "iam_role_arn": blah,
          "recipe": blah,
          "region": blah,
          "weight": blah,
        },
        "context": {
          "personalize_context": {
             "item_id_field": blah,
             "user_id_field": blah
          },
          "document_fields": ["not", "sure", "these", "are", "used", "in", "this?"]
        }
      }
    }
  ]
}

Implementing this within the framework I've provided should be fairly straightforward (just implement a AmazonPersonalizeSourceContextFetcher and a AmazonPersonalizeRerankProcessor). Also may want to include some score-normalization stuff - so might want to implement a sibling of RescoringRerankProcessor called ScoreCombinationRerankProcessor, but that can consume a lot of work that's already been done for hybrid search.

Oh also to update on the architecture in case you haven't seen the latest changes to the PR: I introduced the concept of a ContextSourceFetcher which is something that, well, fetches context. Currently there's two implementations that are being used by text_similarity - DocumentContextSourceFetcher and QueryContextSourceFetcher. They do pretty much what you would expect.

The factory now creates the context source fetchers based on the configuration, and they are used by the top-level RerankProcessor, which is now an abstract class rather than an interface. I probably need to reorganize the files a bit.

navneet1v commented 11 months ago

@HenryL27 yeah this looks pretty neat.

One more thing, in the original ML Model based re-ranker I see we want to use text_similarity to define what kind of re-ranker it is. Can we think of a better name?

Another thing can we now update the proposal (by creating a new section with updated interfaces) and also add a comment with what changes we have done in the proposal.

HenryL27 commented 11 months ago

define "better". I used text_similarity to mirror the function name in ml-commons. Do you have a suggestion?

updated RFC

navneet1v commented 11 months ago

because text_similarity is mirror of ML commons and as a user when he/she looks text_similarity what does it tell them? Like amazon_personalize tells that it is using Amazon personalize re-ranker but that is not the same with text_similarity.

I have some suggestions but those are also not that great may be ml_ranker, model_reranker.

HenryL27 commented 11 months ago

@navneet1v Well, I would argue that when a user looks at text_similarity it tells them that this reranker (remember rerank is still the top layer of the api - I don't think we need a reminder that this is a reranking method) is measuring the similarity of text to rerank. I think that's what we would want, although perhaps the model id comes a bit out of left field.

How about nlp_comparison? That says "this reranks by comparing natural language snippets," and also implies "this uses machine learning to do it"

navneet1v commented 11 months ago

Well, I would argue that when a user looks at text_similarity it tells them that this reranker (remember rerank is still the top layer of the api - I don't think we need a reminder that this is a reranking method)

Yes that is fair. Hence I was saying my suggestions are not that great. :D

In nlp_comparison can we drop comparison and just use nlp. But nlp_comparison is better than text_similarity. Lets update the proposal with this new name. I think @dylan-tong-aws can help us coming with a better name.

@HenryL27 can you update the proposal with final interfaces as summary and recommended approach.

Once that is done we can ask @sean-zheng-amazon, @vamshin , @dylan-tong-aws to review. Also please mention how other re-rankers in Opensearch can be extended from this base re-ranker.

HenryL27 commented 11 months ago

updated. @navneet1v to your satisfaction?

HenryL27 commented 11 months ago

@navneet1v so, where are we at with this?

navneet1v commented 11 months ago

@HenryL27 So, did some discussion and here are some names that got suggested. text_similarity, ml-commons, ml_commons_text_similarity, ml_opensearch

Among all the above options I am leaning towards ml_opensearch. Here is the thought process: The name like amazon_personalize, amazon_kendra, ml_opensearch are the vendors who are providing the re-ranking capability. For local or remote model use case the from a neural search standpoint the vendor is ml commons and not the local model or cohere as remote model. The model entity of ML Commons is providing the abstraction. Hence ml_opensearch suits very well here.

cc: @vamshin , @dylan-tong-aws

HenryL27 commented 11 months ago

@navneet1v ml_opensearch it is! I've also gone through your CR comments; thank you

macohen commented 10 months ago

Is ml_opensearch the only provider of rerankers inside OpenSearch? I know I seem like the "LTR Champion" or whatever, but how do you see Learning-to-Rank fitting in here? It's works at the shard level, so maybe it doesn't, but it might be good for users to think of the API for re-ranking as re-ranking, however it works under the covers. Is this feasible?

@navneet1v, @HenryL27

HenryL27 commented 10 months ago

@macohen shard level does seem to imply that it wouldn't fit in well as a rerank response processor. But maybe we at some point make a reranking search phase results processor or whatever it needs to be - and then I would simply give it a name like ltr_opensearch. (ltr is its own plugin right, not run through ml-commons? following navneet's vendor-based naming scheme I think this makes sense)

navneet1v commented 10 months ago

+1 on @HenryL27 comment. @macohen please provide any other feedback you have.

dylan-tong-aws commented 10 months ago

I had a chat with a customer who substantial OpenSearch usage and experience. We discussed their re-ranking pipelines. One major takeaway is that they need to a way to communicate with feature stores. The features they send to the re-ranker isn't available in their OpenSearch cluster. They use the search results to lookup features in various feature stores to construct the inputs (feature vectors) to the re-ranker.

Would be great to have some connectors to feature stores so that they can be used to help construct the request payload for re-ranking within a pipeline. A simpler interim option--which isn't an ideal solution--is to allow users to provide feature vector(s) in the query context. So, re-ranking will likely involve a two-pass query on the client side. Run a query to retrieve results, which they use to construct feature vector(s) on the client-side using tools like existing feature stores. Then run a second query to perform the re-ranking using the feature vector(s) and possibly search and user context to construct the re-ranking request.

A third option, which is a heavier lift, is to enable OpenSearch to operate as a feature store. Perhaps someone is interested in implementing OpenSearch as a storage option for Feast (https://docs.feast.dev/reference/online-stores)? Perhaps some users would be interested in having OpenSearch double up as a feature store?

HenryL27 commented 10 months ago

@dylan-tong-aws the ContextSourceFetcher interface we're introducing here should make connecting to external feature stores / constructing feature vectors relatively easy. I'm not sure about the details but it should look something like implementing a (e.g.) FeastVectorContextSourceFetcher or something that just makes the appropriate network calls. We also don't currently have a plan for a FeatureVectorRerankProcessor implementation but it should be a fairly simple extension of RerankProcessor.

navneet1v commented 10 months ago

I had a chat with a customer who substantial OpenSearch usage and experience. We discussed their re-ranking pipelines. One major takeaway is that they need to a way to communicate with feature stores. The features they send to the re-ranker isn't available in their OpenSearch cluster. They use the search results to lookup features in various feature stores to construct the inputs (feature vectors) to the re-ranker.

Would be great to have some connectors to feature stores so that they can be used to help construct the request payload for re-ranking within a pipeline. A simpler interim option--which isn't an ideal solution--is to allow users to provide feature vector(s) in the query context. So, re-ranking will likely involve a two-pass query on the client side. Run a query to retrieve results, which they use to construct feature vector(s) on the client-side using tools like existing feature stores. Then run a second query to perform the re-ranking using the feature vector(s) and possibly search and user context to construct the re-ranking request.

A third option, which is a heavier lift, is to enable OpenSearch to operate as a feature store. Perhaps someone is interested in implementing OpenSearch as a storage option for Feast (https://docs.feast.dev/reference/online-stores)? Perhaps some users would be interested in having OpenSearch double up as a feature store?

@dylan-tong-aws thanks for adding the info. The way I will look at this is, basically 1 and 3 are the same thing. We need to fetch the context for the reranker from a source.

As provided by @HenryL27 the interface is currently and we can add these fetchers as per the need.

HenryL27 commented 10 months ago

@navneet1v are there any next steps for me? Or am I just waiting on security review?

opensearch-project / neural-search

[RFC] Improving Search relevancy through Generic Reranker interfaces #485

Problem statement

Goals

Non-goals

Proposed solution

Architecture / Rerank Search Path

Rest APIs

Risks

Implementation Details

Extensibility

Alternative solutions

Rerank Query Type