Closed HenryL27 closed 7 months ago
Hi Henry,
Thanks, for putting this together. I have a few questions...
Can you clarify how you propose to supporting pre and post request format processing in this pipeline? Is this built into the pipeline or did you envision this to be part of the connector? It would be great to have a search processor that provides an easy way to configure JSON-to-JSON transforms to simplify the effort with integrating with various downstream APIs and models.
What controls does the user have around configuring how results are sent to the re-ranker? Let say the re-ranker isn't a managed API and it's hosted on a model server--are you proposing any controls like the ability to send results as async mini-batches and performing post processing like merge and sort?
What controls does the user have with configuring what data get's sent to the re-ranker model? There are slight variations in re-ranking use cases in terms of what inputs are passed to the re-ranker model. In some cases, it's just the search results. Other use cases require the query context.
Thanks @dylan-tong-aws. I have a few responses!
This is a narrow use-case. Just take all your docs and ask a (text-to-float) language model how similar they are. Then sort based off of that. Nonetheless, this alone can give like a 15-20% boost to recall in the top couple, so I think it's worth knocking out.
p.s. Ok, I read up on the cohere rerank api and it should be able to connect to this work more readily than without it
Hi, @HenryL27 thanks for creating the RFC. I have some suggestions and comments:
top_k
parameter in processor and in ext
.context_field
in the response processor is not fitting what the actual value will be. Please rename._source
fetch vector field adds latency. Rather it is advised to fetch the fields which are only required, how re-ranking will work in that case?Query Rerank Pipeline
I see two options please add what is the recommended solution there. @navneet1v thanks!
context_field
tells the processor what text to send from each document as context against which to compare the query. A cross encoder takes (query, context) pairs. So we need to specify what the query and contexts are. What did you have in mind?thanks @HenryL27. Few comments/questions
Looks like this RFC focus only on local models support for cross encoder reranking. To be a complete solution, can we also incorporate supporting remote models? To me supporting both local and remote models should be a goal.
rescoring only subset of results definitely leads to inconsistencies as you called out. This should be taken part of the solution. May be you can give lowest possible score to non competing docs? Lets take a goal of making reranking processor leave results consistent.
We seem to name processor neural_rerank
. While this name looks like supporting generic reranker, RFC focuses only on cross encoder based reranking. Are we sure if current interface can support different techniques in future? If not, should we call something like neural_crossencoder_rerank
to avoid backward compatibility issues as we try to make it more generic? I am not a fan of creating processor for every use case, would rather prefer evaluating current approach to make it more generic.
@vamshin thanks
_rescore
so we don't override the original _score
value, wdyt?PUT /_search/rerank_pipeline
{
"response_processors": [
{
"rerank": {
"cross-encoder": {
"top_k": int (how many to rerank) [optional],
"model_id": id of cross-encoder [required],
"context_field": str (source field to compare to query) [required]
}
}
}
]
}
Implementation-wise I think this becomes a single "rerank" processor and depending on the type ("cross-encoder" here) it casts itself to whatever it needs to be or something
@HenryL27
I'm not sure we need it in both places. In most cases you'll probably just set it in the processor and forget about it. But I thought that maybe if you know you need to rerank a lot of things for a particular query (or you only need to rerank a few things for a particular query) it would be nice if you had an override switch.
If a user needs to do this, he can add the processor in the search request itself, rather than this, so I would not provide multiple overrides.
The scores will be overwritten with the new scores from the cross encoder. Yes, this throws out any previous normalization, and yes, if you only rerank some of the documents you can get weird inconsistencies. I can normalize the cross-encoder scores maybe? I'm not sure it's worth it though. If you have a good idea of what the behavior should be I'm all ears
As this inconsistencies are arriving because of topk
parameter, I would not even put that parameter in the processor. So if user is getting X documents from opensearch, we should re-rank all of them.
I'm not sure I understand what you mean by this. context_field tells the processor what text to send from each document as context against which to compare the query. A cross encoder takes (query, context) pairs. So we need to specify what the query and contexts are. What did you have in mind?
The main point in this was the name context_field
is very generic. Lets rename this.
You wouldn't rerank based on a vector; it only makes sense to rerank based on semantically meaningful text. Presumably, that semantically meaningful text that you're reranking is also the semantically meaningful text that you care about as a search user; though I recognize that that's an assumption. If you don't fetch the context field, then the reranking processor should either do nothing or error out, since there's nothing to rerank.
I am not saying re-rank based on vector field. When doing query customer may put _source
as false(which is very standard usecase for vector search) and make fields:['title', 'description']
etc. In that case _source will be empty, but there will be an array of fields
in the response. So we should not just rely on the _source.
This is the purpose of top_k (and maybe it should be required bc of this?) - all we can really do is force the user to think about it. If the user tries to rerank 10,000 documents, that's kinda on them, ya know? We can optimize the hell out of this code, but the performance bottleneck is the cross encoder model and there's really not much we can do about that besides make it clear that this can be an issue.
My recommendation for this would be that these re-rankers model should run outside of OpenSearch cluster like remote models, where users can use GPU based instances for doing re-ranking. The reason is if the latency for re-ranking is in 100 of ms for like 100 records, then the feature become unusable.
Currently building cross encoders as local, but it should be possible to Connect to the Cohere endpoint as well (does anyone else offer a rerank API?)
we should explore this more. May be our local models deployed in some other services like Sagemakers etc, and not specifically cohere.
@navneet1v
If a user needs to do this, he can add the processor in the search request itself, rather than this, so I would not provide multiple overrides.
True, this is possible. But in a case where I already have a rather complicated search pipeline I might not want to rewrite it all, and I'm not sure that saying "if you want to use a different value for top_k
then rewrite your processor" actually makes the API cleaner. Maybe it can just be a required param of ext and leave it out of the processor definition entirely?
As this inconsistencies are arriving because of
topk
parameter, I would not even put that parameter in the processor. So if user is getting X documents from opensearch, we should re-rank all of them.
Maybe. I guess the assumption with reranking is generally that anything not in the top k is irrelevant and therefore doesn't need to be reranked; as such why return it in the first place? That said, you might want to see things that are not in the top k - in high-latency cases where reranking is constrained, or in testing/tuning cases where you want to see what the reranker doesn't get to see to troubleshoot. We could also take @vamshin's suggestion and fix the inconsistencies by rescoring the docs outside of the top k or something.
The main point in this was the name
context_field
is very generic. Lets rename this.
Gotcha. Do you have a suggestion? I'm following what @austintlee did in the RAG processor.
I am not saying re-rank based on vector field. When doing query customer may put _source as false(which is very standard usecase for vector search) and make fields:['title', 'description'] etc. In that case _source will be empty, but there will be an array of fields in the response. So we should not just rely on the _source.
Huh, I didn't know you could do this! I guess then we'll look for the field in the "fields" array then? Those fields are still (key, value) pairs, right? Should be easy to look at both then.
Regarding remote models, yep, I'm looking into that. Fundamentally the Connector interface should handle all the juicy api discrepancies between remote models. I'll make sure that the text-similarity model type I'm adding to ml-commons can talk to connectors and then we should be all good, right?
We could also take @vamshin's suggestion and fix the inconsistencies by rescoring the docs outside of the top k or something.
I don't completely agree on this thought. We should be consistent either by saying we are going to re-rank k documents and return only K, or we are going to re-rank all the documents and return all of them. Being in the middle state is bad. One way to solve this is by OverSampling processor. Customer asked for X, we retrieved let say 2 * X, re-ranked all 2X documents and returned X documents back.
Gotcha. Do you have a suggestion? I'm following what @austintlee did in the RAG processor.
Re-ranking field is one name that come to my mind.
Huh, I didn't know you could do this! I guess then we'll look for the field in the "fields" array then? Those fields are still (key, value) pairs, right? Should be easy to look at both then.
Yes they are key value pairs. But what I meant to say was, we need to handle this use case too, because doing queries like this, provides latency boost.
I'll make sure that the text-similarity model type I'm adding to ml-commons can talk to connectors and then we should be all good, right?
Yeah may be. Will wait for that to be out.
- Yep, I'll work on making it compatible with Cohere's cross-encoder API. I don't know of any other remote APIs that do reranking but I think they'll all look relatively similar. May require some Connector finagling but otherwise this should work.
@HenryL27, another scenario we're looking to support is for custom re-rank models that are hosted on an external model server like an Amazon SageMaker endpoint. The search pipeline will require more flexibility than what it takes to integrate with a managed API. The hosted model may simply be a classification/regression model that's trained for a re-ranking task (eg. XGBoost).
The gist is that we'll need some flexibility around how the data transfer and protocol:
Data transform (request/response): We need the ability to perform a request and response data transform within the search pipeline or a model serving script on the external endpoint. The latter option is covered by the user. What you suggest around a generic JSON-to-JSON request processor would make it easier to package all the required functionality within an OpenSearch query workflow.
Data exchange: Unlike the managed API, which may have more sophisticated functionality built into the API, a hosted model may require multiple inference calls per query. The hosted model is likely suited to score a mini batch of search results at a time, so multiple async mini-batches might have to be performed to score results when "k" size reaches a point. We need to do further research to determine the importance of more sophisticated scenarios. LTR, for instance, does shard level re-ranking. We need to evaluate how critical this is whether re-ranking as a post process suffices. We also want to investigate how to best integrate LTR ranking models into this pipeline.
We should be consistent either by saying we are going to re-rank k documents and return only K, or we are going to re-rank all the documents and return all of them.
This should suffice. It's generally how second stage re-ranking works, and what we were planning to support. I can re-validate this with our customers--I didn't receive requirements to normalize the re-scored results with results that weren't re-scored.
@HenryL27
Yeah, I think spreading the lowest score to the other docs (or maybe minus a delta?) is probably the behavior we want. Another option I considered was introducing the rerank-score as another search hit field altogether, maybe _rescore so we don't override the original _score value, wdyt?
I like this idea. It is also easier to debug when we have both scores. Only concern/question I have is, it might impact customers using existing OpenSearch clients which do not know about rescore fields ? We may need to validate this
PUT /_search/rerank_pipeline { "response_processors": [ { "rerank": { "cross-encoder": { "top_k": int (how many to rerank) [optional], "model_id": id of cross-encoder [required], "context_field": str (source field to compare to query) [required] } } } ] }
This LGTM! This can be the direction if rerankers cannot be generic
I like this idea. It is also easier to debug when we have both scores. Only concern/question I have is, it might impact customers using existing OpenSearch clients which do not know about rescore fields ? We may need to validate this
@vamshin, what are your thoughts on @navneet1v about combining re-rank and original scores? I am not aware of use cases that require some sort of way to normalize and combine scores. As far as I know, customers just expect to re-rank "k" results or all default to all the results retrieved by the initial retrieval. There's no need for anything fancy.
@vamshin, what are your thoughts on @navneet1v about combining re-rank and original scores?
@dylan-tong-aws I am not sure if you understood what I was trying to say but its not definitely this.
What I am trying to say is lets say if a customer goes ahead and retrieve 100 results, and we re-rank only first 50, then the score of first 50 documents and the later ones will not be consistent. We should be consistent in our results scores.
@navneet1v, right, so if a user says return K number of re-score results, it just returns K results even if the first-stage retrieval had N > K results. It's my understanding that the proposal is to return N results and find a way to normalize the K re-score results so they are consistent. I am agreement that we can just return the K results. As far as I know, this sufficiently delivers on customer requirements.
Ok, consensus on the top k issue; can I get thumbs up?
We will simply rerank every search result that goes through the processor.
Top K is removed entirely. If you request 5000000 documents through a rerank processor and it kills your reranker, that's on you. (Doing this to a embedding ingest processor can also OOM your system, so I think that's okay)
Yes I am aligned with removing topK. If for any other reason a customer want to fetch more results and just want to re-rank few results, they can us Oversampling processor, as mentioned here in my previous comment. (https://github.com/opensearch-project/neural-search/issues/485#issuecomment-1796423711)
Ok, consensus on the top k issue; can I get thumbs up?
We will simply rerank every search result that goes through the processor.
Top K is removed entirely. If you request 5000000 documents through a rerank processor and it kills your reranker, that's on you. (Doing this to a embedding ingest processor can also OOM your system, so I think that's okay)
This implementation still honors the search result limit safe guards and query timeout settings, correct?
I think so? That's more a question for Froh I think
Aligned on removing topK to keep consistent results. Also this is not a one way door decision. If use cases arise to expose such params we can always revisit.
@vamshin @navneet1v Alright, here's a rough sketch of my low-level plan for generality here:
We'll introduce an interface called RerankProcessor
that extends SearchResponseProcessor
or something. We'll implement a RerankProcessorFactory
that will construct various implementations of this interface as we come up with them. The interface (maybe actually ABC, idk) will have 3 important methods:
abstract score(SearchResults, ScoringContext)
- scores all search results given a contextrerank(SearchResults, ScoringContext)
- reranks the search results given a contextabstract generateScoringContext(SearchResults, SearchQuery)
- generate the context that the prev. two useThe default rerank
implementation will simply call score
to get new scores, replace them in the search results, and then re-sort the search results.
The default processSearchResults
implementation will first generateScoringContext
and then rerank
.
ScoringContext
is just gonna be like a <String, Object>
map.
I think this should allow any implementable reranking processor to be implemented cleanly, and will align nicely with the new PUT /_search/pipeline
API.
In the case of the cross encoder reranker the score
method will call an ml-commons cross-encoder, and the generateScoringContext
method will find the query_text
field in the search query.
@navneet1v I ran a query where I asked for stuff as fields and it came back and told me not to do that as it would cost performance. I'd have to turn it into some kind of re-invertible inverted index or something? It seems to want to use fields for keyword fields, whereas typically for reranking you'll want to rerank based off of a more fuzzy similarity field, right? idk, I feel like I'm not understanding something. Is the fielddata=true index shenaniganery more effificant that source anyway? I'm also not convinced that this would ever be the bottleneck... idk, it's easy to implement so I will/did, it's just not quite adding up for me
Hi @HenryL27, Here are some of my thoughts on the interface and the Low level design:
rerank
with the following interface:PUT /_search/rerank_pipeline
{
"response_processors": [
{
"rerank": {
"model_id": id of the model used for re-ranking can be local or remote [required],
"context": {
"ranking_source_fields": ["title", "title_and_description" ..] (required) // list of field values per document that needs to be passed to re-ranking model as a single string.
.... // other fields that can come in future which can be part of context.
}
}
}
]
}
POST index/_search?search_pipeline=rerank_pipeline
{
"query": {...}
"ext": {
"rerank": {
"query_context": {
"query_string": "", // optional
"path": "" optional str (path in the search body to the query text) [required],
}
}
}
}
The reason why I am thinking to have a query_context object and 2 other fields inside it to make sure that user has a capability to provide the query for re-ranking in 2 forms: a. path : User want to use the re-ranking query string from the actual query. b. query_string: User can fill this string for complex re-ranking query like what was population of USA in 2003? This type of query might get represent in the actual query clause hence providing "query_string", can help bridge the gap. User can only set either query_string or path.
With this, I can see changes in low level plans which you have added. Please go ahead and update the low level plan. cc: @vamshin , @dylan-tong-aws
@navneet1v I ran a query where I asked for stuff as fields and it came back and told me not to do that as it would cost performance.
Can you paste what was the response and what your query?
Can you paste what was the response and what your query?
query:
POST testindex/_search?search_pipeline=hybrid_pipeline
{
"size": 20,
"_source": ["text_representation", "properties", "type"],
"docvalue_fields": ["type"],
"query": {
"hybrid": {
"queries": [
{
"match": {
"text_representation": "Was abraham lincoln a good president?"
}
},
{
"neural": {
"embedding": {
"query_text": "Was abraham lincoln a good president?",
"model_id": "5EbxXosBQHd70iP-nKjn",
"k": 100
}
}
}
]
}
}
}
response:
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [type] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [...
but if I set docvalue_fields
to ["type.keyword"]
then it's fine with it
@HenryL27 Can you try this query?
POST testindex/_search?search_pipeline=hybrid_pipeline
{
"size": 20,
"_source": "false",
"fields": ["text_representation", "properties", "type"],
"query": {
"hybrid": {
"queries": [
{
"match": {
"text_representation": "Was abraham lincoln a good president?"
}
},
{
"neural": {
"embedding": {
"query_text": "Was abraham lincoln a good president?",
"model_id": "5EbxXosBQHd70iP-nKjn",
"k": 100
}
}
}
]
}
}
}
@navneet1v aha okay thanks. That works.
@navneet1v
PUT /_search/rerank_pipeline { "response_processors": [ { "rerank": { "model_id": id of the model used for re-ranking can be local or remote [required], "context": { "ranking_source_fields": ["title", "title_and_description" ..] (required) // list of field values per document that needs to be passed to re-ranking model as a single string. .... // other fields that can come in future which can be part of context. } } } ] }
I worry that all reranking options might not use a model. Maybe I have a reranker that attempts to fit as many individual documents into a given context window for a future rag step. Arguably that isn't reranking, but you can see there is potential for rerankers that do different things than simply compare a query to a set of fields
For the _search part of the API, do we need query_context
layer? The current implementation simply looks for the path xor the text as fields of the rerank
object
I worry that all reranking options might not use a model.
Another worry. Models are gonna want to have different contexts. XGBoost, for example, will want a feature vector. Maybe that's constructed ahead of time in a JSON2JSON processor, but I think it would make sense for an XGBoost rerank processor to be configured at pipeline creation to construct such a vector.
Or a model that uses user information to help rerank. We have to tell it where to find that user info, no?
Also what do you want me to do with the list of multiple context.ranking_source_fields
? Cross encoders can score the similarity of a pair of strings - what would be the expected behavior in this particular use-case if a user said to rerank on multiple fields?
@HenryL27
I worry that all reranking options might not use a model. Maybe I have a reranker that attempts to fit as many individual documents into a given context window for a future rag step. Arguably that isn't reranking, but you can see there is potential for rerankers that do different things than simply compare a query to a set of fields
Here is my understanding, if we are seeing this kind of use-case then we can either have another processor called as Non-ML-Re-ranker or some better name. But I don't see creating re-ranker type based on model types like cross-encoder re-ranker or cohere re-ranker that is too much granularity.
For the _search part of the API, do we need query_context layer? The current implementation simply looks for the path xor the text as fields of the rerank object
I didn't get this question.
Another worry. Models are gonna want to have different contexts. XGBoost, for example, will want a feature vector. Maybe that's constructed ahead of time in a JSON2JSON processor, but I think it would make sense for an XGBoost rerank processor to be configured at pipeline creation to construct such a vector.
This brings up an interesting question, which is who should own the responsibility to creating the feature vector is it Neural Plugin or ML Commons Plugin?
Or a model that uses user information to help rerank. We have to tell it where to find that user info, no?
This is the reason why I was suggesting context
option in the API. So that we can add different context source fetcher options like fetch data from a data source or from the _source etc. Currently I suggested only ranking_source_fields
because we are implementing different context source like lets say DDB, S3, mogodb etc.
Cross encoders can score the similarity of a pair of strings - what would be the expected behavior in this particular use-case if a user said to rerank on multiple fields?
I see here as 2 options, in general what I saw from cohere is if we want to provide more than 1 field values as context we can concatenate the strings and pass them as 1 string. So we can go with this. Going forward if we need different behavior or different way concatenation we can provide those options in the processor and default behavior can be simple concatination.
@navneet1v
I agree that specifying model type is too much granularity. What about naming the subtype after the ml-algortihm/function name it uses? So this would be text similarity re-ranker
or something?
Here I'm just asking why the "query_context"
layer can't be left out of this API
POST index/_search?search_pipeline=rerank_pipeline
{
"query": {...}
"ext": {
"rerank": {
"query_context": {
"query_string": "", // optional
"path": "" optional str (path in the search body to the query text) [required],
}
}
}
}
instead taking something like
POST index/_search?search_pipeline=rerank_pipeline
{
"query": {...}
"ext": {
"rerank": {
"query_string": "", // optional
"path": "" optional str (path in the search body to the query text) [required],
}
}
}
My take is that the ml-commons predict API should be a very thin wrapper around models themselves. So ml-commons would take a feature vector (or some kind of MLInput that represents that) and translate it into the form that the model wants. The decisions about what goes in the vector/input belong in neural search.
makes sense, ok
ok, will concat for now. I think there's some work in the pipes for some kind of prompt framework (which is essentially just f-strings) that maybe we can make use of in the future? Although maybe the best model is for that to just be another processor so we don't need to construct such a string at all
@HenryL27
Here I'm just asking why the "query_context" layer can't be left out of this API
POST index/_search?search_pipeline=rerank_pipeline { "query": {...} "ext": { "rerank": { "query_context": { "query_string": "", // optional "path": "" optional str (path in the search body to the query text) [required], } } } }
This one is better abstraction. I am aligned on this.
Although maybe the best model is for that to just be another processor so we don't need to construct such a string at all
On this I am not that much convinced for creating a processor. But we can leave this decision when the use case arrives. As of now lets go with concatenation. This one will have quite a similarity with the Summarization Processor. There also we might want to summarize multiple fields. So we can think of a common generic way.
My take is that the ml-commons predict API should be a very thin wrapper around models themselves. So ml-commons would take a feature vector (or some kind of MLInput that represents that) and translate it into the form that the model wants. The decisions about what goes in the vector/input belong in neural search.
This is a valid point if we just look from predict api standpoint. I am not sure if going forward predict api will be integrated with Agents framework of MLCommons, if yes then it becomes counter intuitive to say predict api as thin wrapper because then we can build an Re-ranking agent whom we pass search response, model and other information and it make sure that it gives the final re-ranked results.
Again its not a use case for now, so lets keep park it for future.
I agree that specifying model type is too much granularity. What about naming the subtype after the ml-algortihm/function name it uses? So this would be text similarity re-ranker or something?
For this can you put comment the interface which you have in mind.
@navneet1v example text similarity-based rerank API
PUT /_search/rerank_pipeline
{
"response_processors": [
{
"rerank": {
"text_similarity": {
"model_id": id of TEXT_SIMILARITY model [required],
},
"context": {
"document_fields": [ "title", "text_representation", ...],
...
}
}
}
]
}
In the future, when there are other function names in ml commons for other kinds of rerank models (or we wanna bypass ml-commons entirely) this is represented in the API
@HenryL27
@navneet1v example text similarity-based rerank API
PUT /_search/rerank_pipeline { "response_processors": [ { "rerank": { "text_similarity": { "model_id": id of TEXT_SIMILARITY model [required], }, "context": { "document_fields": [ "title", "text_representation", ...], ... } } } ] }
In the future, when there are other function names in ml commons for other kinds of rerank models (or we wanna bypass ml-commons entirely) this is represented in the API
I found 2 re-rankers that doesn't use ML models. Please check this: https://github.com/opensearch-project/search-processor/tree/main/amazon-personalize-ranking, https://github.com/opensearch-project/search-processor/tree/main/amazon-kendra-intelligent-ranking
I think we should see how we can merge all these interfaces or the interface that we are building is it extendable enough to support those re-rankers in future.
@navneet1v Just glancing at these, I don't think it will be too difficult. Your interface would look something like (e.g. for personalize)
PUT /_search/rerank_pipeline
{
"response_processors": [
{
"rerank": {
"amazon_personalize": {
"campaign": blah,
"iam_role_arn": blah,
"recipe": blah,
"region": blah,
"weight": blah,
},
"context": {
"personalize_context": {
"item_id_field": blah,
"user_id_field": blah
},
"document_fields": ["not", "sure", "these", "are", "used", "in", "this?"]
}
}
}
]
}
Implementing this within the framework I've provided should be fairly straightforward (just implement a AmazonPersonalizeSourceContextFetcher
and a AmazonPersonalizeRerankProcessor
). Also may want to include some score-normalization stuff - so might want to implement a sibling of RescoringRerankProcessor
called ScoreCombinationRerankProcessor
, but that can consume a lot of work that's already been done for hybrid search.
Oh also to update on the architecture in case you haven't seen the latest changes to the PR:
I introduced the concept of a ContextSourceFetcher
which is something that, well, fetches context. Currently there's two implementations that are being used by text_similarity
- DocumentContextSourceFetcher
and QueryContextSourceFetcher
. They do pretty much what you would expect.
The factory now creates the context source fetchers based on the configuration, and they are used by the top-level RerankProcessor
, which is now an abstract class rather than an interface. I probably need to reorganize the files a bit.
@HenryL27 yeah this looks pretty neat.
One more thing, in the original ML Model based re-ranker I see we want to use text_similarity
to define what kind of re-ranker it is. Can we think of a better name?
Another thing can we now update the proposal (by creating a new section with updated interfaces) and also add a comment with what changes we have done in the proposal.
define "better". I used text_similarity
to mirror the function name in ml-commons. Do you have a suggestion?
updated RFC
because text_similarity
is mirror of ML commons and as a user when he/she looks text_similarity
what does it tell them? Like amazon_personalize
tells that it is using Amazon personalize re-ranker but that is not the same with text_similarity
.
I have some suggestions but those are also not that great may be ml_ranker
, model_reranker
.
@navneet1v
Well, I would argue that when a user looks at text_similarity
it tells them that this reranker (remember rerank
is still the top layer of the api - I don't think we need a reminder that this is a reranking method) is measuring the similarity of text to rerank. I think that's what we would want, although perhaps the model id comes a bit out of left field.
How about nlp_comparison
? That says "this reranks by comparing natural language snippets," and also implies "this uses machine learning to do it"
Well, I would argue that when a user looks at text_similarity it tells them that this reranker (remember rerank is still the top layer of the api - I don't think we need a reminder that this is a reranking method)
Yes that is fair. Hence I was saying my suggestions are not that great. :D
In nlp_comparison
can we drop comparison
and just use nlp
. But nlp_comparison
is better than text_similarity
. Lets update the proposal with this new name. I think @dylan-tong-aws can help us coming with a better name.
@HenryL27 can you update the proposal with final interfaces as summary and recommended approach.
Once that is done we can ask @sean-zheng-amazon, @vamshin , @dylan-tong-aws to review. Also please mention how other re-rankers in Opensearch can be extended from this base re-ranker.
updated. @navneet1v to your satisfaction?
@navneet1v so, where are we at with this?
@HenryL27 So, did some discussion and here are some names that got suggested.
text_similarity, ml-commons, ml_commons_text_similarity, ml_opensearch
Among all the above options I am leaning towards ml_opensearch
. Here is the thought process: The name like amazon_personalize, amazon_kendra, ml_opensearch are the vendors who are providing the re-ranking capability. For local or remote model use case the from a neural search standpoint the vendor is ml commons and not the local model or cohere as remote model. The model entity of ML Commons is providing the abstraction. Hence ml_opensearch
suits very well here.
cc: @vamshin , @dylan-tong-aws
@navneet1v ml_opensearch
it is! I've also gone through your CR comments; thank you
Is ml_opensearch the only provider of rerankers inside OpenSearch? I know I seem like the "LTR Champion" or whatever, but how do you see Learning-to-Rank fitting in here? It's works at the shard level, so maybe it doesn't, but it might be good for users to think of the API for re-ranking as re-ranking, however it works under the covers. Is this feasible?
@navneet1v, @HenryL27
@macohen shard level does seem to imply that it wouldn't fit in well as a rerank response processor. But maybe we at some point make a reranking search phase results processor or whatever it needs to be - and then I would simply give it a name like ltr_opensearch
. (ltr is its own plugin right, not run through ml-commons? following navneet's vendor-based naming scheme I think this makes sense)
+1 on @HenryL27 comment. @macohen please provide any other feedback you have.
I had a chat with a customer who substantial OpenSearch usage and experience. We discussed their re-ranking pipelines. One major takeaway is that they need to a way to communicate with feature stores. The features they send to the re-ranker isn't available in their OpenSearch cluster. They use the search results to lookup features in various feature stores to construct the inputs (feature vectors) to the re-ranker.
Would be great to have some connectors to feature stores so that they can be used to help construct the request payload for re-ranking within a pipeline. A simpler interim option--which isn't an ideal solution--is to allow users to provide feature vector(s) in the query context. So, re-ranking will likely involve a two-pass query on the client side. Run a query to retrieve results, which they use to construct feature vector(s) on the client-side using tools like existing feature stores. Then run a second query to perform the re-ranking using the feature vector(s) and possibly search and user context to construct the re-ranking request.
A third option, which is a heavier lift, is to enable OpenSearch to operate as a feature store. Perhaps someone is interested in implementing OpenSearch as a storage option for Feast (https://docs.feast.dev/reference/online-stores)? Perhaps some users would be interested in having OpenSearch double up as a feature store?
@dylan-tong-aws the ContextSourceFetcher
interface we're introducing here should make connecting to external feature stores / constructing feature vectors relatively easy. I'm not sure about the details but it should look something like implementing a (e.g.) FeastVectorContextSourceFetcher
or something that just makes the appropriate network calls. We also don't currently have a plan for a FeatureVectorRerankProcessor
implementation but it should be a fairly simple extension of RerankProcessor
.
I had a chat with a customer who substantial OpenSearch usage and experience. We discussed their re-ranking pipelines. One major takeaway is that they need to a way to communicate with feature stores. The features they send to the re-ranker isn't available in their OpenSearch cluster. They use the search results to lookup features in various feature stores to construct the inputs (feature vectors) to the re-ranker.
Would be great to have some connectors to feature stores so that they can be used to help construct the request payload for re-ranking within a pipeline. A simpler interim option--which isn't an ideal solution--is to allow users to provide feature vector(s) in the query context. So, re-ranking will likely involve a two-pass query on the client side. Run a query to retrieve results, which they use to construct feature vector(s) on the client-side using tools like existing feature stores. Then run a second query to perform the re-ranking using the feature vector(s) and possibly search and user context to construct the re-ranking request.
A third option, which is a heavier lift, is to enable OpenSearch to operate as a feature store. Perhaps someone is interested in implementing OpenSearch as a storage option for Feast (https://docs.feast.dev/reference/online-stores)? Perhaps some users would be interested in having OpenSearch double up as a feature store?
@dylan-tong-aws thanks for adding the info. The way I will look at this is, basically 1 and 3 are the same thing. We need to fetch the context for the reranker from a source.
As provided by @HenryL27 the interface is currently and we can add these fetchers as per the need.
@navneet1v are there any next steps for me? Or am I just waiting on security review?
Problem statement
Addresses #248
Reranking the top search results with a cross-encoder has been shown to improve search relevance rather dramatically. We’d like to do that. Furthermore, we’d like to do that inside of OpenSearch, for a couple reasons: 1/ it belongs there - it’s a technique to make your search engine search better, and 2/ it needs to precede RAG to integrate with it - the retrieval that augments the generation needs to be as good as possible - and succeed the initial retrieval, obviously - so it should be in OpenSearch.
Goals
Non-goals
Proposed solution
Reranking will be implemented as a search response processor, similar to RAG. Cross-Encoders will be introduced into ml-commons to support this.
Architecture / Rerank Search Path
Rest APIs
Create Rerank Pipeline
"ml_opensearch" refers to the kind of rerank processor. "model_id" should be the id of the
text_similarity
model in ml-commons "context" tells the pipeline how to construct the context it needs in order to rerank "document_fields" are a list of fields of the document (in_source
orfields
) to rerank based on. Multiple fields will be concatenated as strings.Query Rerank Pipeline
Provide to the search pipeline as a search ext the params for the reranker. Use either
"query_text"
, which acts as the direct text to compare all the docs against, or"query_text_path"
, which is an xpath that points to another location in the query object.For example, with a neural query we might have
"query_text_path": "query.neural.embedding.query_text"
The rerank processor will evaluate the all search results, and then sort them based on the new scores.
Upload Cross Encoder Model
This is not a new API and all the other model-based APIs should still work for the cross encoder model/function name with minimal work to integrate.
Predict with Cross Encoder Model
See the Cross-Encoder PR
Risks
Implementation Details
The overall reranking flow will be:
We will implement two main base classes for this work:
RerankProcessor
andContextSourceFetcher
.ContextSourceFetcher This will retrieve the context needed to rerank documents. Essentially, step 1. A particular rerank processor may make use of several of these, and they can get their context from any source.
RerankProcessor Orchestrates the flow by combining all the context from the ContextSourceFetchers, then generates scores for the documents via an
abstract score
method, then does the sorting.Extensibility
It is my hope that these interfaces are simple enough to extend and configure that we can create a rich ecosystem of rerank processors. To implement the cross-encoder reranker, all I need to do is create a
NlpComparisonReranker
subclass that says "score things with ml-commons", aDocumentContextSourceFetcher
subclass that retrieves fields from documents, and aQueryContextSourceFetcher
that retrieves context from the query ext.If I wanted to implement the Amazon Personalize reranker of the
search-processors
repo, I would implement anAmazonPersonalizeSourceContextFetcher
and anAmazonPersonalizeReranker
, which only have to do the minimal amount of work to make the logic functional.I also think is should be possible incorporate some of the work from the Score Normalization and Combination feature, but that's outside the scope of this RFC.
Alternative solutions
Rerank Query Type
Another option is to implement some kind of rerank query. This would wrap another query and rerank it. For example
Pros:
Cons: