run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
37.12k stars 5.33k forks source link

Reason of using _to_llama_similarities() normalization technique and not using raw similarity scores for Retrieval through Elastic Search Vector Store #12486

Closed Subham0793 closed 8 months ago

Subham0793 commented 8 months ago

Question Validation

Question

Because of _to_llama_similarities function, the scores are normalized and maximum score is always at 1.0

What is the reason of this normalization technique, and why has this being called by default?

The problem may arise when some query which is not related to a particular document, is used for retrieving the chunks, and still the results will contain scores starting from 1.0 , even when there is no relation of the chunk with respect to the query.

Code snippet:

return VectorStoreQueryResult( nodes=top_k_nodes, ids=top_k_ids, similarities=_to_llama_similarities(top_k_scores), )

Line no. 623 in

https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-elasticsearch/llama_index/vector_stores/elasticsearch/base.py

dosubot[bot] commented 8 months ago

Hey @Subham0793, great to see you diving deep into the LlamaIndex framework again! Your keen eye for detail never ceases to amaze. 😊

The _to_llama_similarities() function is used by default in the LlamaIndex framework's Elastic Search Vector Store integration to normalize similarity scores, ensuring the maximum score is always 1.0. This normalization is performed by subtracting the maximum score from all scores and then applying the exponential function. This approach is beneficial because it transforms the scores into a range that is easier to interpret while maintaining the relative differences between the scores. The highest score gets normalized to 1.0, indicating the closest match, and other scores are adjusted accordingly, representing their relative similarity to the query.

Normalization can potentially obscure the true relevance of the results to the query in certain contexts. Since the scores are adjusted relative to the highest score, the absolute differences between the original scores are not preserved. This means that if the original scores are very close to each other, their normalized versions will still span from 0 to 1, possibly exaggerating the perceived differences in relevance. Conversely, if the original scores vary widely, normalization compresses these differences into a smaller range, potentially underrepresenting the variance in relevance among the results.

Normalization is a common technique in information retrieval and machine learning to make results more interpretable or to prepare data for further processing. However, the choice to use normalization should be informed by the specific requirements of the application and the characteristics of the data. In some cases, it might be more appropriate to work with the raw scores or to apply a different normalization technique.

Unfortunately, the provided context does not include a way to disable the default normalization behavior of the _to_llama_similarities() function directly through a parameter or configuration option. The function _to_llama_similarities(scores: List[float]) -> List[float] normalizes the scores using a specific formula without checking for any flags or parameters that could disable this behavior.

To alter this behavior, you would need to modify the function's implementation in the source code. However, I would advise against this as it could lead to unexpected behavior elsewhere in the application. Instead, you might consider processing the raw scores in a way that suits your specific needs after they have been returned by the query.

For more information, you can refer to the source code of the function in the LlamaIndex repository.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
Subham0793 commented 8 months ago

but what if in case the raw similarity scores (before normalization) are very low (0.3-0.4), this function still changes the range to keep the max score at 1.0 which should not be right. Retrieved docs with low scores can be of very low quality, and after normalization we are giving it a high score.

logan-markewich commented 8 months ago

I think this was added at one point because elastic returned scores that are not ranged 0 to 1

Feel free to make a PR to change this

Subham0793 commented 8 months ago

@logan-markewich but isn't this a bug? Why was this specifically made to keep the max value at 1.0?

say we are getting top 5 docs from a query. And the doc at k=1 need not be of score 1.0, it can be of score 0.85

The problem with this is we are giving high scores for those documents which are not that similar to the query