Open infrawhispers opened 1 year ago
@infrawhispers You are right, there is no single entry point for this yet. Also, merge_shards is different from fresh-diskann paper. It is the method described in the original DiskANN paper. The procedure to merge an in-mem index to a SSD index and create a new SSD index is not yet in main. There is an outdated version in #11 which needs to be redone for the latest main. Once that is done, we can attempt a single entry point. You are most welcome to contribute any of these.
hi @harsha-simhadri , I'm also interested in the FreshDiskANN implementation. Is there any roadmap about that?
By the way, what is the difference between the #11 and current code behind apps/test_insert_deletes_consolidate
?
hi @harsha-simhadri , I'm also interested in the FreshDiskANN implementation. Is there any roadmap about that? By the way, what is the difference between the #11 and current code behind
apps/test_insert_deletes_consolidate
?
seems that the tests are all in memory?
Hi!
The FreshDiskANN paper outlines the StreamingMerge procedure. In combing through the codebase (main @ f8ef303), there doesn't appear to be a singular entrypoint that allows a caller to utilize the FreshDiskANN API contract without being aware of all the types of indices.
test_streaming_scenario.cpp
outlines how to build an in-memory index that supports inserts and deletes.build_stitched_index.cpp
outlines how to merge indicessearch_disk_index.cpp
demonstrates how to run a search across an index that is stored on disk.Given a client that provides a memory budget and no starting list of vectors, my reading of the paper would indicate the following needs to be done in a wrapping class:
test_streaming_scenario.cpp
and is the only sink for insertions.build_disk_index.cpp
... this index would not have a true build phase as there is nothing to add.merge_shards
within disk_utils.h - during the merge process, we would have already created a new mutable in-memory index for any in-flight writes + deletes.I would be happy to submit a patch that unifies the above in such a way that a caller can just create an Index and not have to worry about RO-TempIndex, RW-TempIndex and the SSD-Resident Index; however, I would like to confirm that my read on the current codebase is correct in that there is no singular entrypoint for this.