Open jmazanec15 opened 3 weeks ago
we added an ingest pipeline to normalize the data for faiss engine who want to Cosine similarity
I prefer option 2. It is more simple as we keep everything as is and only pass normalized data to faiss engine. Could you tell why we need exact scoring lazily compute norms and normalize
for option 2?
For option 2, for re-scoring (exact search), because the vectors are not normalized, we would need to compute norms, as we do now.
we added an ingest pipeline to normalize the data for faiss engine who want to Cosine similarity
@luyuncheng interesting. What are your thoughts on having this support in k-NN plugin VS Ingest pipeline approach?
For option 2, for re-scoring (exact search), because the vectors are not normalized, we would need to compute norms, as we do now.
For re-scoring we can just use cosine distance calculation instead of (normalization + innerproduct)?
cosine distance calculation is implemented as normalization + innerproduct I believe
we added an ingest pipeline to normalize the data for faiss engine who want to Cosine similarity
@luyuncheng interesting. What are your thoughts on having this support in k-NN plugin VS Ingest pipeline approach?
@vamshin @jmazanec15 @heemin32 as @jmazanec15 says
cosine distance calculation is implemented as normalization + innerproduct I believe
cosine distance calculation is normalization + innerproduct also says in faiss#wiki
before i read this issues, and i do need cosine distance in faiss.
firstly i introduced normalization
at faiss_wrapper#CreateIndex
and do normalize before faiss::write_index
like @jmazanec15 's option 1. then i found, every merge would do normalize
again. it increased the refresh time.
so in the end, i introduced a new pipeline doing normalize,
PROS:
CONS:
Description
Cosine similarity is one of the more popular space types. faiss does not support it directly. Instead, they prefer to have data be normalized and then use the inner product (which is equivalent to cosine --> <u,v>/||u||||v|| = cos(theta) (https://github.com/facebookresearch/faiss/blob/2c961cc308ade8a85b3aa10a550728ce3387f625/README.md?plain=1#L11). We should figure out how to add cosine for faiss now that it is default (#2163)
We have a couple different options:
I prefer option 3 mainly because it will allow us to use use knn vector values as synthetic source (see #1571), but also let us be efficient on search.
We would need to investigate how best to do this. One simple way to do it would be to add one extra dimension and store the normalized value there.