model-collapse commented 1 year ago

[RFC] Supporting sparse semantic retrieval based on neural models

Background

Dense retrieval based on neural models has achieved great success in search relevance tasks. However, dense methods use k-NN to search the most relevant docs, which consumes large amount of memory and CPU resource. It is very expensive. Recent years, there are a lot of research about sparse retrieval based on neural models, such as DeepCT[1], SparTerm[2], SPLADE[3,4]. Since sparse retrieval fashion can be naturally implemented using inverted index, these methods are as efficient as BM25. After fine-tuning, neural sparse retrieval can achieve high search relevance on par with dense methods. Neural sparse methods also show great generalization ability. SPLADE defeats all dense methods in BEIR benchmark with the same setting. Thus we propose to implement support for sparse retrieval based on neural models.

Example Comparison 1: Dense and Sparse retrieval models on MS-MARCO dataset.

Type	Algorithm	Model	MRR@10	Latency*
Dense	Encoding	BERT-tasb	0.347	120ms
Dense	Late Interaction	ColBERT	0.359	193ms
Sparse	BM25	-	0.184	15ms
Sparse	Expansion	DeepCT	0.243	-
Sparse	Expansion	SparTerm	0.279	-
Sparse	Encoding	SPLADE-max^	0.340	117ms
Sparse	Encoding	DistilSPLADE-max	0.368	-
Sparse	Encoding	SPLADE-doc	0.322	19ms

*: All the experiments are conducted on a single OpenSearch node with 8 * 2.4G CPU cores and 32GB RAM. \ ^: SPLADE-max conducts a BERT model inference for query encoding, thus have similar latency to dense methods.

Example Comparison 2: Splade vs. Others on BEIR benchmarking dataset.

Metrics	BM25	ColBERT	TASB	SPLADE
Avg. NDCG@10	0.440	0.455	0.435	0.500
Winners in BEIR	2	2	0	11

All the above performance are extracted from the SpladeV2 paper.

Example Comparison 3: Splade vs. openai embedding on BEIR benchmarking subset.	Metrics	Ada	Babbage	Curie	Davinci	SPLADE
Avg. NDCG@10	0.490	0.505	0.509	0.528	0.527
Winners in BEIR	0	0	1	5	5

Above table is extracted from the paper of openai. The experiments are conducted on a subset of BEIR benchmark.

What are we going to do?

Implement api to search in the sparse semantic space, where the query can be encoded via either a simple tokenizer or a deep neural network.
Implement api to ingest documents in the form of sparse vectors (i.e. tokens and weights) generated by neural models.
Provide out-of-box model weights for neural sparse retrieval. The model should have good search relevance on general domains without fine-tuning.

Design

We are going to implement one IngestionProcessor for document sparse encoding and one QueryBuilder for sparse querying. Before ingestion or querying, we will have the sparse encoding model deploy via the ml-commons plugin and the ingestion processor will consequently invoke the prediction action for the encoding result. If query encoding is enabled, the query builder will also encode the queries via prediction actions while passing the query through a bert tokenizer when query encoding is disabled. Because the encoding result is in the form of sparse vector, it is very natural to adopt the term vector based Lucene index. Here is the architecture diagram.

Architecture

Term Weight integration with Lucene

Suggested by the SPLADE paper, the relevance score is calculated following $r = \Sigma_{t \in T}w_t$ where $w_t$ is the weight of sparse term $t$ and $T$ is the intersection term set of query and document. Standard Lucene indices will only store TF(term frequency) and DF(document frequency), we will implement an analyzer that interprets term weights and stores into the payload attribute. Since the above formula is not a standard Lucene relevance scoring function, we will involve a PayloadScorer with a sum operator in the query.

The out-of-box Model

The schema of the sparse encoding model will be similar to SparTerm or SPLADE, where the input is a natural language sentence while the output is a sparse vector(in SPLADE, the sparse terms are BERT tokens). We will mainly focus on cross-domain optimization for a better relevance over different scenarios. The models are planned to be released in huggingface.co and have Appache 2.0 licence.

API

Ingestion Processor

The ingestion processor can be created with following API, where the field field_map can specify the fields need to be encoded and the new field names after encoding.

PUT /_ingest/pipeline/sparse-pipeline
{
    "description": "Calling sparse model to generate expanded tokens",
    "processors": [
        {
            "neural_sparse": {
                "model_id": "fousVokBjnSupmOha8aN",
                "field_map": {
                    "body": "body_sparse"
                }
            }
        }
    ]
}

Sparse Search Query

Similar to vector search based neural search, sparse retrieval is also bind with a query type called neural_sparse. One can search the field body_sparse (sparse encoded fields only) via the API below.

GET /test-index-3/_search
{
    "query": {
        "neural_sparse": {
            "body_sparse": {
                "query_text": "i be going",
                "model_id": "fousVokBjnSupmOha8aN",
                "tokenizer": "fousVokBjdfs4vb6gGdgYl"
            }
        }
    }
}

The fields model_id and tokenizer are optional, if model_id presents, the query executor will call the sparse model for query encoding, while if tokenizer presents, the executor will only encode the query via tokenization.

Reference

[1] Dai et al, Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval, arxiv.org. 2019.

[2] Bai et al, SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval, arxiv.org. 2020.

[3] Formal et al, SPLADE: Sparse lexical and expansion model for first stage ranking, SIGIR, 2021.

[4] Formal et al, SPLADE v2: Sparse lexical and expansion model for information retrieval, arxiv.org. 2021.

[5] Neelakantan et al, Text and Code Embeddings by Contrastive Pre-Training, arxiv.org. 2022.

jmazanec15 commented 1 year ago

This is really interesting! @model-collapse Could you include the proposed interfaces you are going to add and what they would look like?

jmazanec15 commented 1 year ago

Additionally, I believe Lucene has some features to support this case. See https://github.com/apache/lucene/issues/11799.

navneet1v commented 1 year ago

Hi @model-collapse, Thanks for adding this RFC. I have couple of questions, so that I understand the proposal properly.

This new query type neural_sparse is getting mapped to what Query clause of Lucene/OpenSearch
This seems more like an analyser use case where indexed text is getting analzsed using a model that provides weights/preference to some words. Did we consider the alternative to model this as Analyzer?
During indexing to what field type of lucene these fields will be mapped to?
If sparse retrieval is as good as BM-25 what is the exact use case we want to target here?
As suggested by @jmazanec15 I think we should look into the github issue: https://github.com/apache/lucene/issues/11799 at lucene side. They were trying to add a similar functionality in lucene too.

model-collapse commented 1 year ago

Hi @model-collapse, Thanks for adding this RFC. I have couple of questions, so that I understand the proposal properly.

This new query type neural_sparse is getting mapped to what Query clause of Lucene/OpenSearch

This seems more like an analyser use case where indexed text is getting analzsed using a model that provides weights/preference to some words. Did we consider the alternative to model this as Analyzer?

During indexing to what field type of lucene these fields will be mapped to?

If sparse retrieval is as good as BM-25 what is the exact use case we want to target here?

As suggested by @jmazanec15 I think we should look into the github issue: Indexing method for learned sparse retrieval apache/lucene#11799 at lucene side. They were trying to add a similar functionality in lucene too.

@navneet1v

This query type is mapped into PayloadQuery with a base recall query(SpanOrTermQuery) and a PayloadScorer.
The relevance score calculation is different from BM25, thus it is not convenient to separate the implementation into Analyzer and Scorer.
Just like what we did in neural-search, the field_map field will cover this.
Just check the performance table. Sparse model can have better accuracy than BM25 but the efficiency is almost the same.

zhichao-aws commented 1 year ago

Hi @model-collapse, Thanks for adding this RFC. I have couple of questions, so that I understand the proposal properly.

This new query type neural_sparse is getting mapped to what Query clause of Lucene/OpenSearch

This seems more like an analyser use case where indexed text is getting analzsed using a model that provides weights/preference to some words. Did we consider the alternative to model this as Analyzer?

During indexing to what field type of lucene these fields will be mapped to?

If sparse retrieval is as good as BM-25 what is the exact use case we want to target here?

As suggested by @jmazanec15 I think we should look into the github issue: Indexing method for learned sparse retrieval apache/lucene#11799 at lucene side. They were trying to add a similar functionality in lucene too.

Hi @navneet1v , for question 4, sparse models have advantages over both KNN and BM25:

compared with KNN
- if both fine-tuned
- sparse models can have performance on par with KNN, but cosume much less memory, and has better search latency
- if both not fine-tuned
- sparse models are better at search relevance, memory and latency
compared with BM25
- They both use inverted index. For search scenerio, sparse models may have a little more flops because they may expand the document vocab. But this is not a critical concern for search latency. Sparse models have better search relevance even without fine-tuning. If sparse models are fine-tuned, the performance gap between BM25 and them are much larger.

navneet1v commented 1 year ago

Hi @model-collapse, Thanks for adding this RFC. I have couple of questions, so that I understand the proposal properly.

This new query type neural_sparse is getting mapped to what Query clause of Lucene/OpenSearch

This seems more like an analyser use case where indexed text is getting analzsed using a model that provides weights/preference to some words. Did we consider the alternative to model this as Analyzer?

During indexing to what field type of lucene these fields will be mapped to?

If sparse retrieval is as good as BM-25 what is the exact use case we want to target here?

As suggested by @jmazanec15 I think we should look into the github issue: Indexing method for learned sparse retrieval apache/lucene#11799 at lucene side. They were trying to add a similar functionality in lucene too.

@navneet1v

This query type is mapped into PayloadQuery with a base recall query(SpanOrTermQuery) and a PayloadScorer.

The relevance score calculation is different from BM25, thus it is not convenient to separate the implementation into Analyzer and Scorer.

Just like what we did in neural-search, the field_map field will cover this.

Just check the performance table. Sparse model can have better accuracy than BM25 but the efficiency is almost the same.

@model-collapse I cannot see where we are comparing the latency of BM-25 with the sparse retrieval?

For #4 , we compared on the accuracy but I will be really interested in latency.

navneet1v commented 1 year ago

Hi @model-collapse, Thanks for adding this RFC. I have couple of questions, so that I understand the proposal properly.

This new query type neural_sparse is getting mapped to what Query clause of Lucene/OpenSearch

This seems more like an analyser use case where indexed text is getting analzsed using a model that provides weights/preference to some words. Did we consider the alternative to model this as Analyzer?

During indexing to what field type of lucene these fields will be mapped to?

If sparse retrieval is as good as BM-25 what is the exact use case we want to target here?

As suggested by @jmazanec15 I think we should look into the github issue: Indexing method for learned sparse retrieval apache/lucene#11799 at lucene side. They were trying to add a similar functionality in lucene too.

Hi @navneet1v , for question 4, sparse models have advantages over both KNN and BM25:

compared with KNN

if both fine-tuned

sparse models can have performance on par with KNN, but cosume much less memory, and has better search latency

if both not fine-tuned

sparse models are better at search relevance, memory and latency

compared with BM25

They both use inverted index. For search scenerio, sparse models may have a little more flops because they may expand the document vocab. But this is not a critical concern for search latency. Sparse models have better search relevance even without fine-tuning. If sparse models are fine-tuned, the performance gap between BM25 and them are much larger.

@zhichao-aws, I understand the memory footprint between dense and sparse will be high. Also, my question was never about comparing dense and sparse vectors. From memory and latency yes sparse vector will work well. I want to compare Sparse vectors with OpenSearch Text Search. The three parameters, I would at-least like to see the comparison is:

Search Latency
Relevance
Memory Footprint. From the proposal I can see we provided info for 2 only. Can we add details for 1 and 3 too.

zhichao-aws commented 1 year ago

@navneet1v , since the implementation details have large impact on Latency/Memory metrics, it's hard to give a concrete number in RFC state within the framework of opensearch. However, the search latency with other engines supporting inverted index is also a strong indicator to illustrate their efficiency: The above figure comes from this paper, and the first 3 items are Splade-series models. They use PISA and Anserini instead of opensearch. With several optimizations, Splade's latency is of the same magnitude with BM25. And Splade-doc's latency is within twice. Note that there are hyperparameters to control the "expansion" of sparse models, the same algorithm can still have different search relevance and efficiency(usually negatively correlated), which require us to find a balance point.

vamshin commented 1 year ago

One clarification question, Do we need support of sparse vector data type in k-NN plugin similar to knn_vector(dense vector) to support sparse vector indexing/search?

zhichao-aws commented 1 year ago

One clarification question, Do we need support of sparse vector data type in k-NN plugin similar to knn_vector(dense vector) to support sparse vector indexing/search?

No, we can use lucene engine to index and search sparse vectors. We will implement this feature in neural-search and ml-commons.

vamshin commented 1 year ago

@zhichao-aws sorry could you explain which data type you would use for indexing sparse vectors?

zhichao-aws commented 1 year ago

@zhichao-aws sorry could you explain which data type you would use for indexing sparse vectors?

Hi @vamshin , for sparse vectors we need to build a machanism to index and search based on term-weight map. Our initial proposal was to use normal opensearch text field (i.e. TextFieldMapper) to index terms and put terms' weights in payload attribute. This requires us to build a new analyzer to parse the input text and encode the payload. For searching, we build a new query clause to decode the payload and sum the terms' weights.

After more research, we find that the lucene FeatureField is more straightforward and extensible. If we choose FeatureField, we'll introduce a new field like "sparse_vector" by implementing a new wrapper FieldMapper to transform the input term-weight map to lucene FeatureField (the wrapper FieldMapper can be put at neural-search plugin or somewhere else, we can discuss about that). For searching, we'll build a new query clause.

For both design we have finished the POC code and proved they're workable. We'll run some benchmarking test to examine their execution efficiency.

navneet1v commented 1 year ago

Hi @model-collapse , @zhichao-aws Putting some high level thoughts on the APIs provided in this RFC:

We are trying to model the APIs just like we did for the dense vector use-case. I think we should create a separate github issue where we provide all the different alternatives that we have considered while coming up with the interface provided on top.
Sparse vector is a new field for the users. I think we should do more deep-dive on how we want to model this field type. Few things that we should ans is if a user do a match all query what will be the output of that query will it have a field body_sparse in the source? if yes then what will be its value? etc.
Similarly goes for the query. neural query clause was simple because it was just providing an abstraction over the dense vector query.
Is the ingestion pipeline is the only option to ingest data in the sparse_vector field? what if a user runs the sparse model outside OpenSearch and wants to ingest the response(sparse vector) directly in the index. How that can be done?

zhichao-aws commented 1 year ago

Hi @model-collapse , @zhichao-aws Putting some high level thoughts on the APIs provided in this RFC:

We are trying to model the APIs just like we did for the dense vector use-case. I think we should create a separate github issue where we provide all the different alternatives that we have considered while coming up with the interface provided on top.

Sparse vector is a new field for the users. I think we should do more deep-dive on how we want to model this field type. Few things that we should ans is if a user do a match all query what will be the output of that query will it have a field body_sparse in the source? if yes then what will be its value? etc.

Similarly goes for the query. neural query clause was simple because it was just providing an abstraction over the dense vector query.

Is the ingestion pipeline is the only option to ingest data in the sparse_vector field? what if a user runs the sparse model outside OpenSearch and wants to ingest the response(sparse vector) directly in the index. How that can be done?

Hi @navneet1v , these are really good questions. We did many investigations about different routines and debated much about the user interface. We will create a seperate issue to list our proposal and all alternatives we've considered.

hdhalter commented 12 months ago

Hi all, Please create a doc issue or PR ASAP if this has doc implications for 2.11. Thanks.

asfoorial commented 11 months ago

Great feature, I look forward to it and I hope it is going to be generally available on the upcoming 2.11 release in October.

asfoorial commented 11 months ago

Are the out-of-box-models be fine-tuneable? Are you going to publish the fine-tuning process for them?

zhichao-aws commented 11 months ago

Are the out-of-box-models be fine-tuneable? Are you going to publish the fine-tuning process for them?

Since we'll release the weights and structure of our model, users can fine-tune them using Pytorch or other frameworks with their own implementation. Sadly the fine-tuning process is out of scope for this release. If you believe it is important for this feature, let's create a feature request issue and call for comments. This will help us make the decision of future work.

asfoorial commented 11 months ago

Are the out-of-box-models be fine-tuneable? Are you going to publish the fine-tuning process for them?

Since we'll release the weights and structure of our model, users can fine-tune them using Pytorch or other frameworks with their own implementation. Sadly the fine-tuning process is out of scope for this release. If you believe it is important for this feature, let's create a feature request issue and call for comments. This will help us make the decision of future work.

Thanks for the clarification. What I meant is just giving a fine-tuning example in the documentation or as a blog post, just like the example the team has posted for fine-tuning embedding models.

yudhiesh commented 4 months ago

Does it work with a KNN query as well? My team uses a custom inference server for all our ML models.

zhichao-aws commented 4 months ago

Does it work with a KNN query as well? My team uses a custom inference server for all our ML models.

Hi @yudhiesh , KNN query is used for dense vectors and neural sparse needs sparse vectors, like {"a":1, "b":2, "c":1.2, ...}.

If your purpose is using custom inference server to generate sparse vector, then use raw sparse vector for query, then the answer is yes. Query by raw sparse vectors will be supported in 2.14 (if everything goes will, 2.14 will release in several few days).

We have put our neural sparse model on huggingface (model link) and we have a demo deployment script to deploy neural sparse model in SageMaker. It can be a reference here

yudhiesh commented 4 months ago

Does it work with a KNN query as well? My team uses a custom inference server for all our ML models.

Hi @yudhiesh , KNN query is used for dense vectors and neural sparse needs sparse vectors, like {"a":1, "b":2, "c":1.2, ...}.

If your purpose is using custom inference server to generate sparse vector, then use raw sparse vector for query, then the answer is yes. Query by raw sparse vectors will be supported in 2.14 (if everything goes will, 2.14 will release in several few days).

We have put our neural sparse model on huggingface (model link) and we have a demo deployment script to deploy neural sparse model in SageMaker. It can be a reference here

Great thanks for the quick response!

opensearch-project / neural-search