microsoft / DiskANN

Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
Other
1.02k stars 208 forks source link

[BUG] Using autodetect beam width via `-W 0` segfaults #551

Open daxpryce opened 3 months ago

daxpryce commented 3 months ago

Expected Behavior

search_disk_index should optimize beam width

Actual Behavior

Segmentation fault (core dumped)

Example Code

Please see How to create a Minimal, Reproducible example for some guidance on creating the best possible example of the problem

build/apps/search_disk_index --data_type float --dist_fn l2 --index_path_prefix /home/daxpryce/data/e5/2024-01-28_2024-02-03/ann --result_path /home/daxpryce/data/e5/2024-01-28_2024-02-03_kann.bin --query_file /home/daxpryce/data/e5/2024-01-28_2024-02-03/ann_mem.index.data -K 50 -L 128 -W 0 -T 128

Dataset Description

Please tell us about the shape and datatype of your data, (e.g. 128 dimensions, 12.3 billion points, floats)

Error

The code in https://github.com/microsoft/DiskANN/blob/main/apps/search_disk_index.cpp#L213-L214 will only work if the warmup values have been changed, likely if the #define WARMUP is set to true (it's false by default).

As a result, by the time you get to line 213, warmup is nullptr, warmup_num is 0, and warmup_aligned_dim is still 0.

Then when we enter https://github.com/microsoft/DiskANN/blob/main/include/percentile_stats.h#L39, we create a vector of size 0 (because warmup_num was 0), and then try to get the element at index 0 out of a 0 length vector. And we get a segfault. Boom!

I'm not sure how to fix this myself correctly, but at the least we should guard against WARMUP==false && beam_width=0 so we can error early and say "cannot autodetect optimal beam width if we haven't warmed up" or something like that. And I'm not even really sure how I'm supposed to make warmup actually happen.

Your Environment