Open weiZhenkun opened 6 months ago
Is it a right way? Can I use L2 to calc the distance between the 2 embeddings created from e5-base-v2? Yes, L2 distance between normalized embeddings is mathematically equivalent to cosine similarity. The difference is that smaller L2 distance means better match, for cosine similarity, a higher score means better match.
If we use the cosine similarity, need I normalize the embeddings? No, you do not. The cosine similarity computation already contains a normalization step.
If the threshold of the entire e5-base-v2 is [0.7,1], is there a suitable range for the relatively similar areas? It really depends on your application. It is better to determine the threshold based on a validation dataset.
@intfloat Thanks for your quick response. 2 more questions:
@intfloat Thanks for your quick response. 2 more questions:
- If it is used to compare text similarity, is it recommended to use L2 or consine to calculate the embeddings generated by e5-base-v2?
- Is there any range for L2?
@intfloat Can you help me? Thanks.
Describe I am using model e5-base-v2, I have seen the doc in the https://huggingface.co/intfloat/e5-base-v2, the doc says the cosine similarity scores distribute around 0.7 to 1.0.
how I use the e5-base-v2 model?
My questions:
@intfloat Can you help me?