[Question]: Which similarity Metric is used to retrieve Chunks in RAG operations

JMT800 commented 1 month ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I was facing a similar issue to the one reported #12461 .

I am performing a RAG operation using the default vector store stored locally and i am getting less relevant nodes retrieved as a result. For instance, when in the answer, it is stated that the query is "out of context" and no relevant documents could be found, the RAG operation is retrieving chunks from the vector store.

I am using llama version: 0.9.40.

Hence i had the following questions in mind:

what is the distance metric used to measure the similarity between the chunks and the query with the default vector store? can we change it ?
why does it also provide "high" similarity scores (around 0.7) for non relevant queries, even when trying nonsensical queries like "aaaaaa a a a"? (cause this complicates the use of the threshold_cutoff attribute within the optimizer.)
Also, what is the default prompt used when performing the RAG operation ? how does it take into incorporate the retrieved chunks?

cheers & thanks in advance

dosubot[bot] commented 1 month ago

The default vector store in LlamaIndex uses "cosine similarity" as the distance metric to measure the similarity between chunks and queries. You can change this metric by specifying the metric parameter when creating the vector store. For example, when using Pinecone as the vector store, you can set the metric to "euclidean" or other supported metrics ^[1].

Similarity scores might return high values for nonsensical queries due to the normalization process. Both sparse and dense similarities are normalized to a range of 0 to 1, which can result in high relative scores even for low absolute similarity values if the range of similarities is narrow. This normalization can cause nonsensical queries to receive high scores, complicating the use of the threshold_cutoff attribute within the optimizer ^[2]^[3].

The default prompt used for evaluating the relevance of a document in a RAG operation is defined by the DEFAULT_VECTOR_STORE_QUERY_PROMPT. This prompt structures the user's query to match a request schema and includes examples to guide the formatting of the query and filters. It ensures that the query string contains only text expected to match the contents of documents and that filters only refer to attributes that exist in the data source ^[4].

To continue talking to Dosu, mention @dosu.

logan-markewich commented 1 month ago

The default is cosine similarity. Of course, other vector db integrations support other metrics.

The scores largely depend on the embedding model being used. The default, text-embedding-ada-002 has 1536 dimensions, which when used on a tiny piece of text (like a) produce a very generic embedding. Id say 0.7 with ada-002 is the default "neutral" score in my experience

The default query engine prompt (and other prompts) are here: https://github.com/run-llama/llama_index/blob/910db520492b7cc5bb7d11d9d072f09c6ba62727/llama-index-core/llama_index/core/prompts/default_prompts.py#L99

run-llama / llama_index

[Question]: Which similarity Metric is used to retrieve Chunks in RAG operations #13913

Question Validation

Question