jbellis / jvector

JVector: the most advanced embedded vector search engine
Apache License 2.0
1.47k stars 110 forks source link

Reduce tendency of reconnectOrphanedNodes to leave orphaned nodes #335

Closed jkni closed 4 months ago

jkni commented 4 months ago

This PR proposes three changes, each addressing a different source of disconnected graphs.

  1. Update the entry node before reconnecting orphaned nodes. If the entry node is updated after, orphaned nodes that were reconnected to the new entry node tend to be disconnected by improvements to the new entry node.
  2. Preserve connection targets map across loop iterations. When reset on each iteration of the loop, it's not exceedingly rare to see disconnection cycles where on run 0, node A is reconnected (disconnecting B) and node C is reconnected to node D in its neighbor list. Then, if D is the best neighbor of B, C/B will alternate being reconnected across future iterations.
  3. Exclude connectionTargets when searching for new connection points. If all of a node's existing neighbors have been considered and a search is performed, it is not unusual to see that the existing neighbors are many of the search results. This is particularly important if approach 2 is used, as connection targets will grow across loop iterations.

Testing on real data sets looks positive. This approach has also been successful at reconnecting graphs build from random vectors.