Open zhengbuqian opened 7 months ago
This comment is to track the implementation progress of sparse vector support in Milvus.
pd.SparseType
Pending:
internal/core/src/storage/Util.cpp and parquet_c.h
how is this used? How do we handle sparse?Knowhere tracking issue: https://github.com/zilliztech/knowhere/issues/193
Exciting about the new feature!
basic sparse support has been added to master branch with the merge of #30357, #30629, #30630 and pymilvus #1920.
For SDK owners:
We also need to support sparse float vector in C#/NodeJs/Java/Go SDK.
The accepted sparse input format:
scipy.sparse
representations as input.{30: 0.34, 78: 0.11, 22: 0.66}
.When sending requests to milvus(both insert and search), use one proto bytes
to represent a single sparse vector, and encode it as densely packed bytes: idx, val, idx, val, ...
. Indices in the packed bytes should be in uint32
range and ordered in ascending order(the user input can be unordered though). No duplicate indices allowed.
Note that support for those SDKs is not a must-have for the formal milvus 2.4 release. We'll be adding more features for sparse and announcing GA in the next major release(2.5 or 2.6). I'll keep updating the issues as necessary.
Thanks a lot for the efforts!
Is there an existing issue for this?
Is your feature request related to a problem? Please describe.
Now milvus supports only dense vectors and lack the ability to store/index/search sparse vectors(vectors with up to million dimensions while only a handful of them are non zero). We wish to add sparse float vector support to Milvus so users can insert, index and search them with ease.
Describe the solution you'd like.
No response
Describe an alternate solution.
No response
Anything else? (Additional Context)
No response