microsoft / DiskANN

Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search
Other
1.06k stars 215 forks source link

std:bad_alloc Error when loading PQ pivots #503

Open JingyuanHe1222 opened 8 months ago

JingyuanHe1222 commented 8 months ago

After processing the points for MIPS metrics and saved the PQ information, DiskANN is trying to load the PQ pivots to build the index (before checking memory usage to determine if we should build the whole graph in shards or in one-shot). I'm getting a std:bad_alloc error here as below.

I notice in /include/tsl/sparse_map.h and /include/tsl/sparse_set.h that the bad_alloc error coud be triggered as the allocator cannot allocate memory. But I've worked on indices with larger number of input vectors: ex. 88 million * 768 on the same machine (180GB memory), and the job I used to wrapped the index building process suggests that 160/180GB of the memory is being allocated. This got me confused and I wonder if the dimension of the vectors affect the memory usage here.

Do you have any idea on what is the possible cause of this bad_alloc error loading the PQ pivots and processing the vectors? Thank you so much for your time!

input vectors: 8 million * 4096 metrics: MIPS index parameters: -R 128 -L 200 -B 128 -M 64 memory available for allocation: 180GB CPU threads: 28

Processing chunk 511 with dimensions [4089, 4097) Writing bin: /home/jingyuah/llama/original_0103/llama__pq_pivots.bin bin: #pts = 256, #dims = 4097, size = 4195336B Finished writing bin. Writing bin: /home/jingyuah/llama/original_0103/llama__pq_pivots.bin bin: #pts = 4097, #dims = 1, size = 16396B Finished writing bin. Writing bin: /home/jingyuah/llama/original_0103/llama__pq_pivots.bin bin: #pts = 513, #dims = 1, size = 2060B Finished writing bin. Writing bin: /home/jingyuah/llama/original_0103/llama__pq_pivots.bin bin: #pts = 4, #dims = 1, size = 40B Finished writing bin. Saved pq pivot data to /original_0103/llama__pq_pivots.bin of size 4217888B. Opened: /original_0103/llama__prepped_base.bin, size: 144899795332, cache_size: 67108864 Reading bin file /original_0103/llama__pq_pivots.bin ... Opening bin file /original_0103/llama__pq_pivots.bin... Metadata: #pts = 4, #dims = 1... done. Reading bin file /original_0103/llama__pq_pivots.bin ... Opening bin file /original_0103/llama__pq_pivots.bin... Metadata: #pts = 256, #dims = 4097... done. Reading bin file /original_0103/llama__pq_pivots.bin ... Opening bin file /original_0103/llama__pq_pivots.bin... Metadata: #pts = 4097, #dims = 1... done. Reading bin file /original_0103/llama__pq_pivots.bin ... Opening bin file /original_0103/llama__pq_pivots.bin... Metadata: #pts = 513, #dims = 1... done. Loaded PQ pivot information std::bad_alloc