Closed xjtushilei closed 10 months ago
Do you have a use case where you need to build larger-than-memory indexes or are you just saying this would be cool?
I wrote up what I think would be the "right" way to add larger than memory construction: https://github.com/jbellis/jvector/issues/168
Do you have a use case where you need to build larger-than-memory indexes or are you just saying this would be cool?
I need to build many indexes with relatively large data volumes, but the memory is limited. Now we are forced to use a lot of large memory machines, but after the build is completed these machines are no longer needed.
I wrote up what I think would be the "right" way to add larger than memory construction: #168
I saw it, thanks for clarification.
Although DiskANN does not use a lot of memory when using it, the construction consumes a lot of memory. As shown above, Microsoft used a machine with nearly 1.8TB of memory to build 1 billion pieces of vector data at once, even though it only had 128 dimensions.
As shown in the above quote, DiskANN's paper mentions the way of constructing a large vamana index from some small vamana index.
I wonder if jvector has any plans?