milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.76k stars 2.85k forks source link

[Feature]: support Data-in Data-out #35856

Open zhengbuqian opened 1 month ago

zhengbuqian commented 1 month ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Data-in Data-out is a code name, the official feature name is yet to be formalized.

In order to improve the usability of Milvus and make vector based unstructured data(texts, images, audios, etc) similarity search more approachable to non-techs, we want to allow the users to specify the type of embedding models they want to use, and insert raw data directly without worrying about how they should embed the data. At search time, the users can also just provide the raw data as the query instead of providing query embedding.

In the very first phase, only BM25 based Doc-in Doc-out(or Text-in Text-out) will be supported, this is tracked in https://github.com/milvus-io/milvus/issues/35853.

At a later phase, users may be able to bring any type of embedding models/services at their choice and search on any types of data beyond just texts.

More details and API specs will be shared shortly.

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

junjiejiangjjj commented 3 weeks ago

Data-in Data-out

Architecture

Milvus converts raw data into vectors in the proxy service.

image

usage

Support models