zilliztech / GPTCache

Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
https://gptcache.readthedocs.io
MIT License
7.21k stars 502 forks source link

[Bug]: order of similarity matters, WHY? #648

Open dhandhalyabhavik opened 1 month ago

dhandhalyabhavik commented 1 month ago

Current Behavior

Using the default onnx model, Score function

def get_score(a, b):
    return evaluation.evaluation(
        {
            'question': a
        },
        {
            'question': b
        }
    )

Case 1:

a = 'What is neural network?'
b = 'Explain neural network and its components.'
c = 'What are the key components of neural network?'
print (get_score(a, b))
print (get_score(a, c))
print (get_score(b, c))

0.7585506439208984
0.02885962650179863
0.0909486636519432

Case 2:

a = 'What is neural network?'
b = 'Explain neural network and its components.'
c = 'What are the key components of neural network?'
print (get_score(b, a))
print (get_score(c, a))
print (get_score(c, b))

0.17746654152870178
0.013074617832899094
0.8378676772117615

Just changed x,y to y,x while passing argument to get_score, why drastic changes in scores?

Expected Behavior

No response

Steps To Reproduce

No response

Environment

No response

Anything else?

No response

SimFG commented 1 month ago

It seems that you did not experiment with changing x,y to y,x. It seems that you should use print (get_score(a, b)), print (get_score(b, a)) for comparison.

dhandhalyabhavik commented 1 month ago

I did, look at these lines,

print (get_score(a, b)) # in case 1
0.7585506439208984
print (get_score(b, a)) # in case 2
0.17746654152870178
SimFG commented 1 month ago

It's amazing that there is such a phenomenon!

Ali-Parandeh commented 1 month ago

Is it because the LLM replies differently if ranking/ordering of content is different in a rag application?

SimFG commented 1 month ago

I don't know much about this part. Theoretically, the distance between the two vectors should be calculated to get the score. Swapping the positions should not affect the score.

wxywb commented 1 month ago

We trained a cross-encoder model to evaluate similarity, where conceptually the pairs should ignore their positions. However, since it uses BERT, there could be some unusual behavior, as it's a lightweight transformer without any constraints to enforce this.