microsoft / DiskANN

Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
Other
1.14k stars 223 forks source link

index_build_prefix and build_memory_index error #119

Open daxpryce opened 2 years ago

daxpryce commented 2 years ago

If you try to build an index using build_memory_index and don't provide a prefix fragment, only a path, the build fails at the ~ 99% mark.

Reproduction steps:

mkdir /tmp/myindex
chmod 777 /tmp/myindex
$BUILD/tests/build_memory_index --index_path_prefix=/tmp/myindex/ --[other arguments ellided]

Results:

Starting index build with R: 32  Lbuild: 50  alpha: 1.2  #threads: 24
L2: Using AVX2 distance computation DistanceL2Float
Using only first 5841480 from file..
Starting index build with 5841480 points...
99.2899% of index build completed.Starting final cleanup..done. Link time: 300.633s
Index built with degree: max:32  avg:22.1867  min:1  count(deg<2):11166
Indexing time: 303.848
basic_ios::clear: iostream error
Index build failed.
cd /tmp/myindex
>>>  bash: cd: /tmp/myindex: No such file or directory

Note: If you do provide a full prefix, such as /tmp/myindex/randomprefixthingherewhatever it works just fine. It's only when you don't give it an index prefix and give it a directory that this occurs.

Assertion: No matter what, we should not be deleting the myindex folder from the above example case when attempting to write. If we want to make the file prefix fragment a strict requirement vs. allowing just an output folder, that is fine, but we should identify when the user has presented us with that scenario and error accordingly, ideally prior to any index building work being done first.

An even better way may be to split --index_build_prefix into --index_output_directory and --index_prefix as two required, non-empty strings. It would make it easy to detect and to elevate the importance of a non empty string file prefix as an important CTA for the user.

harsha-simhadri commented 1 year ago

@daxpryce Does this bug still exist? If not, can we close?