evaluating and extending hnsw for SL2G

patelprateek commented 2 years ago

came across this interesting paper : http://research.baidu.com/Public/uploads/5e88947b04a4e.pdf IIUC the method proposes to use l2 for building hnsw graphs and searching using some NN function. Curious about any such experiments or results you are aware of. This seems like to be quite practical scenario so wondering how scalable this approach is specially if we need to run a part of neural net as a distance function and how does this approach compare tp simple distance measures for example , would it work better if we build graph using l2 and search using cosine as opposed to building and searching using cosine ?

yurymalkov commented 2 years ago

Hi @patelprateek, thanks for sharing!

I thought a good way to build a graph when you cannot measure the distance to the between the data elements (but can to queries) is to project the elements to vector space of distances to some (random) queries. That induces the distance with the regard to the queries (thus can, e.g. ignore low importance features) which I can imagine works better than generic L2. This has been described e.g. in https://arxiv.org/pdf/1908.06887.pdf or https://cyberleninka.ru/article/n/algoritm-maksimizatsii-relevantnosti-ispolzuyuschiy-grafovye-modeli-dannyh/viewer (in Russian)

searchivarius commented 2 years ago

Hi @patelprateek and @yurymalkov I did not know about the Russian paper that @yurymalkov co-authored. However, I am not sure I fully understand how exactly it is carrying out retrieval. My apologies for potential misunderstanding, but the paper is not very specific about it. It is, however, clear that the graph is constructed using the pivot-induced metric, but which similarity is used for retrieval??? @yurymalkov my reading is that retrieval is done using induced metric, please, correct me if I am wrong.

Regarding the pivot-induced metric, actually generic pivots are not very good. In particular, what our joint co-author David Novak figured out was that you could select much better pivots by generating their components in a special way. I also think that nowadays a better solution would be train some massive BI-ENCODER neural network that "compares" query-document embeddings using the inner product. This is your typical RecSys setup.

In any case here is a paper that uses a better pivot selection: https://arxiv.org/abs/1610.10001

Moreover, IF @yurymalkov and co-authors in their paper did NOT search using the original similarity metric, a similarity-guided retrieval was used in my thesis (in particular in the last) section and in a follow-up paper. That is, it uses a simpler (or different distance/similarity) to construct a graph and a different (possibly more expensive similarity) to carry out retrieval. The idea isn't that groundbreaking, but my personal achievement was to show it could work well (and sometimes better than filter-and-generate) in this case:

https://arxiv.org/abs/1910.03534
http://boytsov.info/pubs/thesis_boytsov.pdf (section 3.3)

BTW, the above mentioned paper by Babenko and Morozov did a very similar thing, which they actually admitted in the final version of the paper. Their paper has an interesting twist (missing in my work): They should that direct inner product search worked better than reduction to L2-search.

yurymalkov commented 2 years ago

Hi @searchivarius, thank you for the references.

The paper is indeed very hard to understand and was written only for Ekaterina's PhD defense. The similarity search is done with the original distance. The problem that is discussed is when there is only a distance between queries and documents, which can be, say, a neural network on top of the some user and item features of different nature (e.g. different counting features and content and collaborative features with incompatible semantics). So, the metric between the items is not defined and the papers address the problem of how to induce some metric to build a graph that can be used for the target metric to search. Using pivots seems like the most trivial thing to do to induce a metric between the items (and as you've mentioned, it probably is not the best way) and build at least some graph. The same idea is also studied in the paper by Stanislav and Artem (but much more clearly expressed and better studied).

nmslib / hnswlib

evaluating and extending hnsw for SL2G #413