nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors
https://github.com/nmslib/hnswlib
Apache License 2.0
4.3k stars 633 forks source link

Silent Fail on Building 1B Index and Searching #530

Open iamsabhoho opened 9 months ago

iamsabhoho commented 9 months ago

Hello,

I've been exploring HNSWLIB Index and playing around with it. I've successfully tested up to 500M. However, when I try running 1B, the program always gets "killed" without any other error messages. I'd like to know how to proceed. Thank you!

karol-t-wilk commented 9 months ago

Perhaps it's not a problem with the index itself but the OS OOMkiller is killing the process? Not sure, but I've ran into that very thing and that was the problem. I've "fixed" it simply by working with smaller data on that particular machine, but I don't know if that is applicable to your case

searchivarius commented 9 months ago

I am pretty sure, it's the OS killing the process when it uses too much memory and possibly starts to swap. @iamsabhoho did you try to estimate the amount of memory your server needs? It's roughly the amount of memory used by vectors (4 bytes per dim probably) plus the index size. For M=50 and dim=100, it's about the same as data size.

iamsabhoho commented 9 months ago

we are testing on a machine with 800GB RAM, M=32, and dim=96.

searchivarius commented 9 months ago

One billion 100-dim vectors in full precision alone use about 500 GB. Plus, there's index. No wonder you can squeeze in 500M but not one billion.