microsoft / SPTAG

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.
MIT License
4.77k stars 581 forks source link

How to use distribute server? #401

Closed suppersam1 closed 9 months ago

suppersam1 commented 9 months ago

I see that the aggregator can aggregate the indexes of multiple machines, but it seems that there is no method for distributing the indexes on different machines. Is it the user who partitions the data and then creates indexes on different data partitions to distribute them on different machines, and finally aggregates them through the aggregator?Or does it create an index for all the data and then distribute the index on each machine?

suppersam1 commented 9 months ago

I resolved.

  1. ./balanceddatapartition -d 10 -v float -i test_index_input.txt -f TXT -c 3
  2. ./balanceddatapartition -d 10 -v float -i test_index_input.txt -f TXT -c 3 -g LocalPartition -o test_partition
  3. Move files from these partitions to different machines.
  4. Use indexbuilder on different machines to index partitioned data
  5. Each machine starts the server and load index files.
  6. Launch aggregator to connect services on various machines
  7. Use the client to connect to the aggregator and start querying
Funlxy commented 3 months ago

I resolved.

  1. ./balanceddatapartition -d 10 -v float -i test_index_input.txt -f TXT -c 3
  2. ./balanceddatapartition -d 10 -v float -i test_index_input.txt -f TXT -c 3 -g LocalPartition -o test_partition
  3. Move files from these partitions to different machines.
  4. Use indexbuilder on different machines to index partitioned data
  5. Each machine starts the server and load index files.
  6. Launch aggregator to connect services on various machines
  7. Use the client to connect to the aggregator and start querying

Hello, i have same problem, what is the difference between step 1 and step 2?