microsoft / DiskANN

Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
Other
1.06k stars 215 forks source link

Add simplified functions for product quantization #514

Closed michael-popov closed 6 months ago

michael-popov commented 6 months ago

Added following functions: generate_pq_pivots_simplified generate_pq_data_from_pivots_simplified

New functions for product quantization that do not rely on retrieving data from files or storing any data in files. These functions can be used by systems that rely on a different type of data storage.

Tests: I instrumented existing functions for producing pivot data and generating pq data. I could confirm that giving identical input (pivot data and input full vectors) a new function generate_pq_data_from_pivots_simplified() produces output identical to the original generate_pq_data_from_pivots() function. Function generate_pq_pivots_simplified() produces output very similar to the original function generate_pq_pivots() given the same input. The differences can be explained by outputs of non-deterministic core functions for generating pivot data.