yahoojapan / NGT

Nearest Neighbor Search with Neighborhood Graph and Tree for High-dimensional Data
Apache License 2.0
1.24k stars 114 forks source link

About graph construction #73

Closed Gtavexir closed 4 years ago

Gtavexir commented 4 years ago

Hello, I would like to ask a question about graph construction. I recently started studying about NNG with your ONNG paper. While studying ONNG, I found myself lacking in graph construction. So can you tell me if you have anything to refer to when I study?

And also, Your ANNG paper(Proximity search in metric spaces using approximate k nearest neighbor graph) is in Japanese. Can I get some paper or information related to it in English?

masajiro commented 4 years ago

Hello, thank you for your interest in NGT. Although there are various kinds of methods to construct graphs, for NGT, the graph construction is described in Algorithm 9 "Construct ANNG" of the ONNG paper. The ONNG and PANNG are derived from the ANNG.

I was not able to find any documents in English about ANNG. However, the ONNG and PANNG papers include most of the ANNG paper. And, I think that the ANNG can be understood from the algorithm figures in the paper even in Japanese.

Gtavexir commented 4 years ago

Thank you for your answer. It will be a great help to my study. 👍

Gtavexir commented 4 years ago

Hello, first of all, thank you for your hard work. I would like to ask a question about where can I check graph construction code from yours? I tried to find it by myself, but I couldn't, so I'm asking. If possible could you tell me the whole process of doing graph construction in relation to your code?

masajiro commented 4 years ago

Hello. The main function to build a graph is createIndex(). If parallel processing is not specified, another createIndex() is called here.

If you just want to know the processing flow, you may want to trace createIndex(). insert() in the latter createIndex() is the main part of the building a graph. However, the flow is seldom used, because it is quite slow.

The processing flow with multi threads of the thread pool in the former createIndex() is used as default setting, but it is so complicated. searchMultipleQueryForCreation() in the createIndex() prepares jobs for the thread pool to search the graph in a batch. insertMultipleSearchResults() inserts nodes and edges into the graph from the result of the search.

Gtavexir commented 4 years ago

Thank you very much for your answer. Thank you again for your hard work. Your answer will always help me a lot! :)