[ ] Doing RAG? Vector search is not enough

Doing RAG? Vector search is not enough

Snippet

"I'm concerned by the number of times I've heard, \"oh, we can do RAG with retriever X, here's the vector search query.\" Yes, your retriever for a RAG flow should definitely support vector search, since that will let you find documents with similar semantics to a user's query, but vector search is not enough. Your retriever should support a full hybrid search, meaning that it can perform both a vector search and full text search, then merge and re-rank the results. That will allow your RAG flow to find both semantically similar concepts, but also find exact matches like proper names, IDs, and numbers."

Full Content

Hybrid search steps

Azure AI Search offers a full hybrid search with all those components:

It performs a vector search using a distance metric (typically cosine or dot product).
It performs a full-text search using the BM25 scoring algorithm.
It merges the results using Reciprocal Rank Fusion algorithm.
It re-ranks the results using semantic ranker, a machine learning model used by Bing, that compares each result to the original usery query and assigns a score from 0-4.

The search team even researched all the options against a standard dataset, and wrote a blog post comparing the retrieval results for full text search only, vector search only, hybrid search only, and hybrid plus ranker. Unsurprisingly, they found that the best results came from using the full stack, and that's why it's the default configuration we use in the AI Search RAG starter app.

When is hybrid search needed?

To demonstrate the importance of going beyond vector search, I'll show some queries based off the sample documents in the AI Search RAG starter app. Those documents are from a fictional company and discuss internal policies like healthcare and benefits.

Let's start by searching "what plan costs $45.00?" with a pure vector search using an AI Search index:

search_query = "what plan costs $45.00"
search_vector = get_embedding(search_query)
r = search_client.search(None, top=3, vector_queries=[
  VectorizedQuery(search_vector, k_nearest_neighbors=50, fields="embedding")])

The results for that query contain numbers and costs, like the string "The copayment for primary care visits is typically around $20, while specialist visits have a copayment of around $50.", but none of the results contain an exact cost of $45.00, what the user was looking for.

Now let's try that query with a pure full-text search:

r = search_client.search(search_query, top=3)

The top result for that query contain a table of costs for the health insurance plans, with a row containing $45.00.

Of course, we don't want to be limited to full text queries, since many user queries would be better answered by vector search, so let's try this query with hybrid:

r = search_client.search(search_query, top=15, vector_queries=[
  VectorizedQuery(search_vector, k_nearest_neighbors=10, fields="embedding")])

Once again, the top result is the table with the costs and exact string of $45.00. When the user asks that question in the context of the full RAG app, they get the answer they were hoping for.

irthomasthomas / undecidability