Closed YichiRockyZhang closed 1 year ago
Hi! You can do ds.get_index("embeddings").faiss_index.metric_type
to get the metric type and then match the result with the FAISS metric enum (should be L2).
Ah! Thank you for pointing this out. FYI: the enum indicates it's using the inner product. Using torch.inner
or torch.dot
still produces a discrepancy compared to the built-in score. I think this is because of the compression/quantization that occurs with the FAISS index.
Describe the bug
After loading
wiki_dpr
using:the index does not have a defined
metric_type
. This is an issue because I do not know how thescores
are being computed forget_nearest_examples()
.Steps to reproduce the bug
System: Python 3.9.16, Transformers 4.30.2, WSL
After loading
wiki_dpr
using:the index does not have a defined
metric_type
. This is an issue because I do not know how thescores
are being computed forget_nearest_examples()
.The FAISS documentation suggests the metric is usually L2 distance (without the square root) or the inner product. I compute both for the sample query:
Here, I get dot product of 80.6020 and L2 distance of 77.6616 and
Doing
k=1
indicates the higher the outputted number, the better the match, so the metric should not be L2 distance. However, my manually computed inner product (80.6) has a discrepancy with the reported (76.2). Perhaps, this has to do with me using thecompressed
embeddings?Expected behavior
Environment info
datasets
version: 2.12.0