Initial Vamana build has limited features and several items to improve performance or usability. Once these items are addressed, we can consider moving it out of the "experimental" namespace. They include:
[ ] Reduce global memory footprint - this limits dataset and graph construction size. Simple ways include batching reverse edge generation, but will still be limited by requiring the entire dataset and graph be resident in device memory. Need to investigate storing the graph in host memory and the potential performance impacts as well.
[ ] Add support for any dimension dataset (current have alignment issues with odd or < 16 for uint8 / int8.
[ ] Auto-select and optimize queue_size for different visited_size values. Also, add support for any visited_size value (currently only poewers of 2.
[ ] Improve performance for high-degree graph build. This seems limited by GreedySearch, which becomes increasingly costly. Things to investigate include reducing shared memory/registers to improve occupancy, improving priority queue efficiency, or trying other data structures.
[ ] Add additional distance metrics - currently only L2 is supported. At least inner product and cosine are needed.
[ ] Add support in C and python APIs.
[ ] Create documentation and collect more extensive benchmark results.
Fixed alignment issue for datasets with dimensions not a multiple of 16, but this introduced a bug that I am currently working on. Resulting delay means we need to push this issue off to the next release.
Initial Vamana build has limited features and several items to improve performance or usability. Once these items are addressed, we can consider moving it out of the "experimental" namespace. They include:
queue_size
for differentvisited_size
values. Also, add support for anyvisited_size
value (currently only poewers of 2.