lior-k / fast-elasticsearch-vector-scoring

Score documents using embedding-vectors dot-product or cosine-similarity with ES Lucene engine
Apache License 2.0
395 stars 112 forks source link

internal algorithm #58

Open applecv3 opened 4 years ago

applecv3 commented 4 years ago

Hi, I just want to know which type of KNN (like HNSW, LSH, and so forth) you built in this plugin.

lior-k commented 4 years ago

The plug-in uses pure cosine-similarity or dot-product to compare vectors. So the K nearest neighbors it returns are the exact K, not any assessment like LSH and others

On Wed, Oct 28, 2020, 10:17 AM Seung notifications@github.com wrote:

Hi, I just want to know which type of KNN (like HNSW, LSH, and so forth) you built in this plugin.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGGISFY5JHKWI2HXMOZ7QLSM7HQJANCNFSM4TB7GUFA .

applecv3 commented 4 years ago

Thank you for your answer! So.. let me ask you some more. Do you mean naive KNN searching algorithm by "pure cosine-similarity"? Is it taking O(N) time complexity? (where N is the number of documents to explore when computing cosine similarity). If so, I'm not sure how your plugin works faster than the others and I saw you mentioned that "I gained this substantial speed improvement by using the lucene index directly". Does that imply all the secrets(?) about how this plugin works fast?

lior-k commented 4 years ago

Yes, it uses brute force to calculate cosine-similarity. Meaning O(n) It is not faster than hnswlib or fasis etc... It is faster then other ES plugins that did the same brute force calculations. The only difference was using the lucene engine. You can see the code :-)

BTW - Amazon has an hnswlib implementation on their manages ES implementation. It should be much faster than this but it has limitations

On Thu, Oct 29, 2020, 7:41 AM Seung notifications@github.com wrote:

Thank you for your answer! So.. let me ask you some more. Do you mean naive KNN searching algorithm by "pure cosine-similarity"? Is it taking O(N) time complexity? (where N is the number of documents to explore when computing cosine similarity). If so, I'm not sure how your plugin works faster than the others and I saw you mentioned that "I gained this substantial speed improvement by using the lucene index directly". Does that imply all the secrets(?) about how this plugin works fast?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/58#issuecomment-718373084, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGGISDM74QJSGQ6ZL4BG73SND6B3ANCNFSM4TB7GUFA .

applecv3 commented 4 years ago

Thank you so much! I really appreciate it. Have a good day!

lior-k commented 4 years ago

BTW, we use k-means with this plug-in inorder to traverse only the input vector nearest clusters instead of the entire corpus.

On Fri, Oct 30, 2020, 2:21 AM Seung notifications@github.com wrote:

Thank you so much! I really appreciate it. Have a good day!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/58#issuecomment-719096909, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGGISBAMEYC6S4LXKMQT5TSNIBHVANCNFSM4TB7GUFA .

sctrueew commented 3 years ago

@lior-k Hi,

What is the difference between this repo and the native ES vector scoring? Which one is faster?

Thanks

lior-k commented 3 years ago

Never tested. This plugin existes way before the official support. If you do test the performance differences please let us all know 🙏

On Thu, Jan 14, 2021, 5:20 PM mz notifications@github.com wrote:

@lior-k https://github.com/lior-k Hi,

Whats is the difference between this repo and the native ES vector scoring? Which one is faster?

Thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lior-k/fast-elasticsearch-vector-scoring/issues/58#issuecomment-760263975, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGGISAXPRZQ3HYDQMU5E6LSZ4DS3ANCNFSM4TB7GUFA .

Shengwuyou commented 3 years ago

Whether the plug-in can perform algorithm configuration, use brute force to calculate cosine similarity, not suitable for high-efficiency scenarios # @lior-k