milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.13k stars 2.71k forks source link

[Bug]: Multivector hybridsearch function is not working #34017

Open yuki-2025 opened 2 weeks ago

yuki-2025 commented 2 weeks ago

Is there an existing issue for this?

Environment

- Milvus version:2.4.3
- Deployment mode(standalone or cluster):cluster (zilliz)
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): GCP vertex AI, ubuntu (python 3.10)
- CPU/Memory: 8 vCPUs, 32GB RAM 
- GPU: NVIDIA L4 GPU, 
- Others:

Current Behavior

I followed the documentation:https://milvus.io/docs/multi-vector-search.md to perform a multivector hybrid search, but I encountered an error when performing Step 3: Perform a Hybrid Search.

res = collection.hybrid_search( reqs, # List of AnnSearchRequests created in step 1 rerank, # Reranking strategy specified in step 2 limit=2 # Number of final search results to return )

print(res)

Error

grpc RpcError: [hybrid_search], <_InactiveRpcError: StatusCode.UNIMPLEMENTED, unknown method HybridSearch for service milvus.proto.milvus.MilvusService>, <Time:{'RPC start': '2024-06-20 04:18:00.304044', 'gRPC error': '2024-06-20 04:18:00.352937'}>

It suggests that the HybridSearch method is not implemented or recognized by the Milvus service.

I checked my version of both Milvus and the Python client, and they are both 2.4.x, which should include this hybrid_search method.

Is there any workaround for this issue?

Expected Behavior

When I do step 3, then it shouldn't have any problem: https://milvus.io/docs/multi-vector-search.md

Steps To Reproduce

1. run the code:
from pymilvus import connections

connections.connect(
   uri="https://xxxx.zillizcloud.com:443",
   token="66a")
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
import random

# Create schema
fields = [
    FieldSchema(name="film_id", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="filmVector", dtype=DataType.FLOAT_VECTOR, dim=5), # Vector field for film vectors
    FieldSchema(name="posterVector", dtype=DataType.FLOAT_VECTOR, dim=5)] # Vector field for poster vectors

schema = CollectionSchema(fields=fields,enable_dynamic_field=False)

# Create collection
collection = Collection(name="test_collection", schema=schema)

# Create index for each vector field
index_params = {
    "metric_type": "L2",
    "index_type": "IVF_FLAT",
    "params": {"nlist": 128},
}

collection.create_index("filmVector", index_params)
collection.create_index("posterVector", index_params)

# Generate random entities to insert
entities = []

for _ in range(1000):
    # generate random values for each field in the schema
    film_id = random.randint(1, 1000)
    film_vector = [ random.random() for _ in range(5) ]
    poster_vector = [ random.random() for _ in range(5) ]

    # create a dictionary for each entity
    entity = {
        "film_id": film_id,
        "filmVector": film_vector,
        "posterVector": poster_vector
    }

    # add the entity to the list
    entities.append(entity)

collection.insert(entities)

from pymilvus import AnnSearchRequest

# Create ANN search request 1 for filmVector
query_filmVector = [[0.8896863042430693, 0.370613100114602, 0.23779315077113428, 0.38227915951132996, 0.5997064603128835]]

search_param_1 = {
    "data": query_filmVector, # Query vector 找的是什么field
    "anns_field": "filmVector", # Vector field name
    "param": {
        "metric_type": "L2", # This parameter value must be identical to the one used in the collection schema
        "params": {"nprobe": 10}
    },
    "limit": 2 # Number of search results to return in this AnnSearchRequest
}
request_1 = AnnSearchRequest(**search_param_1)

# Create ANN search request 2 for posterVector
query_posterVector = [[0.02550758562349764, 0.006085637357292062, 0.5325251250159071, 0.7676432650114147, 0.5521074424751443]]
search_param_2 = {
    "data": query_posterVector, # Query vector
    "anns_field": "posterVector", # Vector field name
    "param": {
        "metric_type": "L2", # This parameter value must be identical to the one used in the collection schema
        "params": {"nprobe": 10}
    },
    "limit": 2 # Number of search results to return in this AnnSearchRequest
}
request_2 = AnnSearchRequest(**search_param_2)

# Store these two requests as a list in `reqs`
reqs = [request_1, request_2]

from pymilvus import WeightedRanker
# Use WeightedRanker to combine results with specified weights
# Assign weights of 0.8 to text search and 0.2 to image search
rerank = WeightedRanker(0.8, 0.2)

# Before conducting hybrid search, load the collection into memory.
collection.load()

res = collection.hybrid_search(
    reqs, # List of AnnSearchRequests created in step 1
    rerank, # Reranking strategy specified in step 2
    limit=2 # Number of final search results to return
)

print(res)

Milvus Log

No response

Anything else?

No response

xiaofan-luan commented 2 weeks ago

@yuki-2025 Hi yuki, from your endpoint seems you are using zilliz cloud. Zilliz cloud support 2.4 feature next week. you will need to beta upgrade your cluster before you use all 2.4 features.

yuki-2025 commented 2 weeks ago

Hi Xiaofan,

Does this mean I can only use the hybrid search function in standalone Milvus (Milvus installed in Docker)? How about Milvus Lite?

Thanks!

yanliang567 commented 2 weeks ago

Milvus lite has not support hyrbid search function yet, it is still in developing. Please try community latest Milvus v2.4.5 (either standalone or cluster deployment) @yuki-2025