milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.03k stars 2.95k forks source link

[Feature]: Support HNSW SQ #23232

Open xiaofan-luan opened 1 year ago

xiaofan-luan commented 1 year ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

SQ8 and PQ are widely used in ANN search. If you want to understand more about quantization, Faiss is probably one of the best code bases to explore.

HNSW is the fastest index in the open source world, so why not make it work together with SQ and PQ to accelerate it further?

Let me know if anyone is interested and we can offer more help on it

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

noble-8 commented 1 year ago

Hi @xiaofan-luan I am interested in contributing! and would like to know how to help! let me know how to get started.

xiaofan-luan commented 1 year ago

cool man! I though @liliu-z could offer you some help

xiaofan-luan commented 1 year ago

you have any experience on cpp and some any idea about HSNW algorithm yet?

noble-8 commented 1 year ago

Hi I had taken a cpp course in college, I have primarily worked as a Java developer ~3 ish years, so i feel that I can onboard quickly. I feel that I am comfortable working on cpp. I am really new to this algorithm but getting up to speed. I am going through this documentation here:https://www.pinecone.io/learn/hnsw/ feel free to point me to other resources. I hope this is not a dealbreaker!

xiaofan-luan commented 1 year ago
  1. you can start from read code from HNSW in knowhere https://github.com/milvus-io/knowhere/tree/c05c8767f43eaa855c13654804d0bea9cc42c7de/src/index/hnsw
  2. Once you understand hnsw, then the next step would be under stand how to do PQ, SQ. Faiss document will give your general ideas and milvus has all faiss code you can utilize https://github.com/facebookresearch/faiss/wiki.
  3. Add index parameters for milvus to support PQ, SQ, which will be a trivial task
liliu-z commented 1 year ago

Hi @noble-8 this is on our roadmap and please feel free to make a PR for https://github.com/milvus-io/knowhere . I suggest we can start from SQ8 which is easier to implement. And more than welcome to open another issue in Knowhere for further detailed communication.

liliu-z commented 1 year ago

/assign @liliu-z

noble-8 commented 1 year ago

Sounds good. Will do!

xiaofan-luan commented 1 year ago

@noble-8 any progress on it?

noble-8 commented 1 year ago

i could not make any progress. i shall try again, however feel free to reassign this if i do not make a commit

xiaofan-luan commented 1 year ago

i could not make any progress. i shall try again, however feel free to reassign this if i do not make a commit

Sure, still thanks for the interest! I would also like to help if you are intersted

LaPetiteSouris commented 1 year ago

I just saw that https://github.com/milvus-io/knowhere is now archived. Wondering if this issue is still open ?

Or should the PR be addressed to https://github.com/zilliztech/Knowhere instead from now on ?

To summarize to make sure that I understand this correctly:

Am I correct ?

xiaofan-luan commented 1 year ago

https://github.com/milvus-io/knowhere

It has been archived and moved to https://github.com/zilliztech/knowhere, sorry for the misunderstanding.

You are correct my man. we want to add quantization support for HNSW index and integrate with Milvus

LaPetiteSouris commented 1 year ago

Thanks. How urgent do you folks need this ? My C++ is rusty 😭 so it may take a while ( I have CoPilot so that helps 😭 ).

But I love this challenge.

LaPetiteSouris commented 1 year ago

@xiaofan-luan if you folks have patience to spare, then assign this to me

Edit: I tried to hack around and it seems that it's a bit too much for me to take this time. I'll pick another good first issue to ramp up.

jiaoew1991 commented 1 year ago

@xiaofan-luan I think the issue is not easy for beginners, it needs lots of knowledge 😅

xiaofan-luan commented 1 year ago

@xiaofan-luan I think the issue is not easy for beginners, it needs lots of knowledge 😅

Agreed you might be correct.

For SQ might be ok?

xiaofan-luan commented 1 year ago

But true it has to be fully understand milvus

xiaofan-luan commented 1 year ago

remove the good first issue

zaobao commented 1 year ago

I sound that HNSW-SQ8 has been available on Ziili Cloud. Is that true?

xiaofan-luan commented 1 year ago

Zilliz cloud don't use HSNW. we have an internal index named Cardinal~

Monster880 commented 6 months ago

whether milvus can use hnsw pq index now ? @xiaofan-luan

xiaofan-luan commented 6 months ago

/assign @liliu-z

xiaofan-luan commented 6 months ago

@liliu-z do we have plan to support hnsw pq and sq index?

Monster880 commented 6 months ago

@xiaofan-luan Can you provide some guidance on where modifications are needed to support HNSW PQ index??

xiaofan-luan commented 6 months ago

NP, I thought Li @liliu-z can help on that.

Monster880 commented 6 months ago

@liliu-z @xiaofan-luan emmm.... where is liliu-z

liliu-z commented 6 months ago

@liliu-z @xiaofan-luan emmm.... where is liliu-z

Sure, there are two ways to support HNSW + Quantization:

  1. Make it a new index type for Milvus
  2. Treat it as HNSW with a special config.

We are adopting the first way. So the work including:

  1. Add the quantization support in algorithm side and expose it as a new index type. Code should be in Knowhere
  2. Let Milvus know this new Index Type

Here is an example PR for the first step. It support SQ8 for HNSW in Knowhere side.

Monster880 commented 6 months ago

@liliu-z it seems HNSW_PQ is not using faiss but using hnswlib after quantization...

xiaofan-luan commented 6 months ago

Now we prefer to use hnswlib rather than faiss for hnsw, so we need to backport pq and sq feature

Monster880 commented 6 months ago

@liliu-z @xiaofan-luan I find it is too hard for me to support pq feature for the backport to hnswlib but the hnsw pq is very beneficial and important for me so that I hope milvus can support hnsw pq index as soon as possible..

Monster880 commented 6 months ago

@xiaofan-luan I think maybe it can temporarily support hnsw pq index by faiss

liliu-z commented 5 months ago

@xiaofan-luan I think maybe it can temporarily support hnsw pq index by faiss

Yes, we are discussing with faiss team about the possibility to switch to faiss' HNSW. There are still some gaps like performance, features and APIs. And we will support pq in hnswlib if we finally decide not to go with faiss.

Will keep this post updated if any progress.

Monster880 commented 5 months ago

@liliu-z I get it and want to know when can we use hnsw pq index by milvus

xiaofan-luan commented 5 months ago

@alexanderguzhva please help on this. we want make sure faiss HNSW has similar performance and exact same functionality with current knowhere implementation

Monster880 commented 5 months ago

@xiaofan-luan @liliu-z Since knowhere already supports hnsw_sq index, when will milvus support it..

alexanderguzhva commented 5 months ago

@xiaofan-luan I'm in the middle of deprecating hwnslib in favor of faiss already, work has been in the progress for some time

Monster880 commented 5 months ago

@xiaofan-luan Actually, I want to know when will milvus support hnsw_sq index, since knowhere already supports hnsw_sq index..

liliu-z commented 5 months ago

@xiaofan-luan Actually, I want to know when will milvus support hnsw_sq index, since knowhere already supports hnsw_sq index..

It has not been fully tested yet, we can try to support it out as a beta function in the next release, which is 1-2 weeks from now. What do you think @xiaofan-luan

xiaofan-luan commented 5 months ago

@liliu-z please make sure knowhere side is ready

@tedxu could you assign someone to support HSNW PQ/SQ and did some test

Monster880 commented 5 months ago

Acutally, offline data pipeline can load index fasterly and avoid train index process. Is there some solution for me to train index offlinely and load index at container(stand-alone milvus) startup. @xiaofan-luan

xiaofan-luan commented 5 months ago

this might be our goal to do so.

using milvus with more offline index node should help.

if you already index by your self, why not simply serve it with faiss or hnsw?