Closed jpleet closed 5 years ago
Just saw line 674 in Command.cpp. Is there a plan to implement reconstruct-graph for shared memory? Is it even possible? Thanks
There is no plan to implement it at the moment. Since I would like to know how it is necessary, could you tell me why you use NGT for shared memory. Your information will be helpful for future development.
I'm using the NGT build with shared memory to create memmapped indices that are larger than RAM. Time to query from an SSD is reasonable for my needs. I was first creating an ANNG on-disk and then wanted to call reconstruct-graph to optimize the ANNG on-disk, but that doesn't work. I noticed that there are now more than 3 graph_types
(a,k,b) and I can create an ONNG (o). I think it's working. If this makes sense, please close the issue. Thanks!
Unfortunately, the graph that the construction mode (o) creates is not so optimized compared to the graph with the reconstruction. I will consider the reconstruction for shared memory. Another option is that after you make ONNG with NGT of not shared memory and export it, you can import it with NGT of shared memory. The export and import commands are not mentioned in the README.
ngt export index exported-files(directory)
ngt import index imported-files(directory)
Hey, any new thoughts about implementing an optimized ONNG construction for shared memory? If NGT did have the feature, you would have an ANN algorithm that:
queries from disk to handle datasets that are larger than memory (ANN is to avoid brute-force search on big data, yet data size is most often limited to RAM size)
based on benchmarks (https://github.com/erikbern/ann-benchmarks), competes with the best, but memory intensive HNSW
allows for updates with new data (don't have to rebuild an index if new data provided)
I don't think any ANN algo could possibly do all this, besides NGT. You could have a very powerful tool.
Thank you for your helpful comment.
Since SSDs are becoming cheaper and faster, I think that reading objects on the SSD does not increase the search time so much. Although I understand that this feature is very competitive with other methods and especially useful for real applications, I have no time to implement it at this moment. However, I will implement it in the near future. In addition, even if you do not use the graph reconstruction, I think that the performance is good enough for applications.
I implemented graph reconstruction for shared memory. Since it is not tested sufficiently, please let me know if you have any problem.
When building with
cmake -DNGT_SHARED_MEMORY_ALLOCATOR=ON ..
, the reconstruct-graph command doesn't get built.Running:
ngt reconstruct-graph
Returns:ngt::reconstructGraph: Not implemented
Aborted (core dumped)
Can the reconstruct-graph command be run on an shared memory ANNG? Thanks! This is a cool system