nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.3k stars 633 forks source link

Question: Single vs. Multiple HNSW Graphs for Nearest Neighbor Searches Across Multiple Companies? #545

Open suil0109 opened 6 months ago

suil0109 commented 6 months ago

Hello,

I'm working on a project that involves using HNSW (HNSWlib) for nearest neighbor searches among data from about 100 different companies (Company A, B, C, D, etc.). Since the data for each company is "unique", I'm trying to figure out the best way to organize this in HNSW graphs to get good search results fast.

I want to know

  1. if it's better to put all the 100 companies' data into one big HNSW graph (would it give a good result?)
  2. if I should create 100 separate graphs for each company to keep the search efficient and accurate.
  3. Is there a better way maybe?

Thank you!

suil0109 commented 6 months ago

Did some experiement on single vs multiple HSNW graph by creating a HNSW Manager. (Also some how parameters impact hsnw search) If yall want to know

https://cool-brick-a38.notion.site/Single-and-Multiple-HNSW-Graphs-for-Efficient-Data-Search-e9319449d56b45a9af7b2044199eb561?pvs=25