kakao / n2

TOROS N2 - lightweight approximate Nearest Neighbor library which runs fast even with large datasets
Apache License 2.0
569 stars 70 forks source link

Loading model with mmap lowering searching speed? #35

Closed huhk-sysu closed 3 years ago

huhk-sysu commented 3 years ago

Hi.

I'm using n2(Python Interface) for searching NNs, and I tried different ways to load the saved index (1000000 * 768, about 3G).

  1. loading index withuse_mmap=True, which is default. In this way, index.load() takes nearly no time, which seems abnormal. Also, when I call index.search_by_vector, the program stuck for a long time(about 1.5 minutes) before giving the result, and during the time it keeps a low CPU and memory usage.
  2. loading index withuse_mmap=False. In this way, index.load() takes several seconds. When I call index.search_by_vector, the program return the result quickly.

I'm not familiar with mmap and I guess that in the first situation, the model haven't been loaded until I start searching. Could you explained it for me? And which way of loading should I use?

gony-noreply commented 3 years ago

Hi @huhk-sysu Information about mmap can be found at the link below.

As you can see on the wiki, mmap is implemented with demand paging. When index.load() is called with use_mmap=True, index read doesn't actually occur. When accessing the index, reading occurs in a lazy manner. This is why your index.load() takes no time and search takes a long time. But the 1.5 minute seems weird.

Load with mmap is suitable in multi-process environments because it allows share memory. If you only use ad single process, it is recommended to use it with use_mmap=False.

huhk-sysu commented 3 years ago

Thanks for your detailed explanation. The 1.5 minute may due to the machine, though it's not so clear. Since I'm using single process, I will trying loading with use_mmap=False.

There's another question: In fact I have a large database (~15 million * 768), but my machine's memory is limited and I can't build an index for it. For the moment I split it and build 15 indexes, each one contains 1 million data. Then I load the indexes, when the query comes, I query each of them, and finally aggregate the results. I wonder if it is the suitable way, any suggestions?

gony-noreply commented 3 years ago

Well, if there is not enough memory, you can use it the way you said. There is no way for N2 to process data larger than the machine's memory yet, but improvements are being made to handle data larger than memory through Product Quantization. The improvement is expected to be available early next year.

huhk-sysu commented 3 years ago

Thanks for your advises.