stepstone-tech / hnswlib-jna

Native-Like Performance for Nearest Neighbor Search in Java Applications using Hnswlib and Java Native Access
Apache License 2.0
32 stars 8 forks source link

Batch knnQueries and Batch element insertion support? #6

Open ashfaq92 opened 4 years ago

ashfaq92 commented 4 years ago

Batch knnQueries: The method knnQuery(query_point, k) takes float[] as query point. What if we want to perform batch queries for float[][] type multiple elements from index? (This feature is particularly needed take full advantage of internal hardware SIMD and parallelism of CPU)

Batch element insertion: In the given examples, the method addItem() takes an float[] array for insertion. What if a user has float[][] of feature vectors? there should be addAll() function (like this)

hussamaa commented 4 years ago

Hi @ashfaq92,

Yeah, that is a limitation we have in our library. JNA doesn't support multi dimensional arrays and that's why it wasn't included here.

https://stackoverflow.com/questions/42042453/jna-two-dimensional-arrays/42049334

BR, Hussama

ashfaq92 commented 4 years ago

alright, Thanks. So we can use for loops.

hussamaa commented 4 years ago

Hi @ashfaq92,

I hope you're well.

Did it work fine for you using the loop for to add items? I'm looking into the possibility of implementing the batch add/queries operations (and checking SIMD). I forgot to say but in our unit tests we have an example of adding multiple items using a multi-threaded approach (which brings a good speed up). However, indeed, we could provide it within the library so that users could easily profit from it. I'll do it! :D

BR, Hussama

ashfaq92 commented 4 years ago

adding elements using for loop works but takes linear time which can be increased by using threading. I shall check out the threading examples again. However, it would be great if there is a support for batched queries. Today, I tried FAISS, it had batched queries support.

Get Outlook for Androidhttps://aka.ms/ghei36


From: Hussama Ismail notifications@github.com Sent: Monday, July 6, 2020 4:29:43 PM To: stepstone-tech/hnswlib-jna hnswlib-jna@noreply.github.com Cc: Muhammad Ashfaq ashfaq92@outlook.com; Mention mention@noreply.github.com Subject: Re: [stepstone-tech/hnswlib-jna] Batch knnQueries and Batch element insertion support? (#6)

Hi @ashfaq92https://github.com/ashfaq92,

I hope you're well.

Did it work fine for you using the loop for to add items? I'm looking into the possibility of implementing the batch add/queries operations (and checking SIMD). I forgot to say but in our unit tests we have an example of adding multiple items using a multi-threaded approach (which brings a good speed up). However, indeed, we could provide it within the library so that users could easily profit from it. I'll do it! :D

BR, Hussama

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/stepstone-tech/hnswlib-jna/issues/6#issuecomment-654093561, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG27DQSRZQVUN5BKB3X72RDR2GDPPANCNFSM4OMJ6MUQ.

hussamaa commented 4 years ago

Thanks a lot @ashfaq92 for the remarks. It's still on my plate to think about how to implement the batch queries. Could you please point me to FAISS example? do you think their approach suits you and other use cases well? Would you have other suggestions?

The addAll is also a good option to speed up the model building but I'm still thinking if we should work with an object like the native Java version does (which contains a vector and an id) or work with separated lists like the native hnswlib does. In meanwhile, I added an example of how to build an index with multi-thread. (https://github.com/stepstone-tech/hnswlib-jna/blob/master/hnswlib-jna-example/src/main/java/com/stepstone/search/hnswlib/jna/example/App.java)

Have a great afternoon! :D

ashfaq92 commented 4 years ago

hi @hussamaa, according to my knowledge, FAISS is good for small scale nearest neighbor search. It has extremely fast indexing speed with little trade off on query accuracy and query efficiency. But if you need to make indexing more often especially when the size of index changes more often, FAISS can be the candidate. In my application domain, my nearest neighbor queries are incremental. In other words, batch queries has to be performed on data-set and then add some elements in data set. This process repeats. In this way, HNSW or Annoy may not ensure good performance. However, more experiments need to be done on that. So far, FAISS is only available in C++ on Linux platforms. So batch queries and multithreading of FAISS is only available in its native C++ (which is not a big deal because HNSW also supports it in its C++ and Python version). But, our this project i.e. HNSW with JNA comes with inherited limitations of JNA. Therefore, it will become challenging to use full features of an algorithm by calling it through JNA. I have tried this FAISS example. In terms of use cases, it depends upon person to person and application domain to decide which nearest neighbor algorithm to use. I suggest to read this and this. It will help to understand general overflow of algorithms. In terms of FAISS setup, I am available to guide if you face any problem.