jelmerk / hnswlib

Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Apache License 2.0
260 stars 56 forks source link

Illustrative Example #24

Closed ashfaq92 closed 2 years ago

ashfaq92 commented 4 years ago

Hi, thank you for the great work (especially adding license :)

We are conducting research for a software testing method that works on the top of your HNSW implementation. I know this seems like an odd request, but, I was wondering if you could please show the inner working of the algorithm via an illustrative example. For example: When an HNSW index is initialized:

HnswIndex<Integer, double[], Point, Double> hnswIndex = HnswIndex
                .newBuilder(DistanceFunctions.DOUBLE_EUCLIDEAN_DISTANCE, graphSize).withM(givenM).withEf(ef)
                .withEfConstruction(efConst).build();

We would be really grateful if you provide what is being initialised under the hood. We have tried to use debug and print statements to print parameters in your HnswIndex.java file, but each time, we add a node to it, it gets refreshed and previous record is gone. Also, how the hnswIndex.add(node) and hnswIndex.findNearest(node.vector(), 1).get(0).distance() work ? It would be nice if you add some nodes to HnswIndex (say up to 3) and then perform a nearest neighbor search for a query point and reveal the internal working of the algorithm.

Note: The paper proposing the HNSW describes the theory and the algorithm, but there is not practical illustration on how the data flows between algorithms for adding/searching nodes to the graph

jelmerk commented 4 years ago

I agree that some documentation on the inner workings of the algorithm would be helpful for some people. I may do this at some point but it's a fair bit of work.

In the meantime if you want to understand the inner workings the easiest thing to do is to open the project in a Java IDE ( I recommend intellij idea with the scala plugin) and single step through one of the unit tests in a debugger. Put a breakpoint in the add method to understand how adding a node works. etc

You should be able to do that in 5 minutes

jelmerk commented 2 years ago

I came across this resource which does a better job at explaining than i ever could

https://www.pinecone.io/learn/hnsw/