milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.32k stars 2.91k forks source link

[Bug]: [null & default] Group by field does not support null #36264

Open binbinlv opened 1 month ago

binbinlv commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version: master-20240913-375cb44b
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc78
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Group by field does not support null

RPC error: [search], <MilvusException: (code=1100, message=groupBy field(nullableFid) not support nullable == true: invalid parameter)>, <Time:{'RPC start': '2024-09-13 19:23:44.507598', 'RPC error': '2024-09-13 19:23:44.557512'}>
Traceback (most recent call last):
  File "./default.py", line 50, in <module>
    res1 = collection.search(vectors[:nq], "float_vector", default_search_params, limit, group_by_field="nullableFid", output_fields=["nullableFid", "int32"])
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/orm/collection.py", line 801, in search
    resp = conn.search(
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 141, in handler
    raise e from e
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 137, in handler
    return func(*args, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 176, in handler
    return func(self, *args, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 116, in handler
    raise e from e
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 86, in handler
    return func(*args, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 800, in search
    return self._execute_search(request, timeout, round_decimal=round_decimal, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 741, in _execute_search
    raise e from e
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 734, in _execute_search
    check_status(response.status)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/utils.py", line 63, in check_status
    raise MilvusException(status.code, status.reason, status.error_code)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1100, message=groupBy field(nullableFid) not support nullable == true: invalid parameter)>

Expected Behavior

Group by field supports null

Steps To Reproduce

from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility
import json
import random

connections.connect()

dim = 128
int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
double_field = FieldSchema(name="nullableFid", dtype=DataType.VARCHAR, nullable=True, max_length=100, is_partition_key=True)
int32_field = FieldSchema(name="int32", dtype=DataType.INT64, default_value=3)
float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim, nullable=False)
schema = CollectionSchema(fields=[int64_field, double_field, int32_field,float_vector])
utility.drop_collection("test")
collection = Collection("test", schema=schema)
res = collection.schema
print(res)
varchar_scalar_index = "TRIE"
scalar_index_params = {"index_type": varchar_scalar_index, "params": {}}

collection.create_index("nullableFid", scalar_index_params, index_name="index_name_0")
#index = {"index_type": "DISKANN", "metric_type": "L2", "params": {}}

index = "HNSW"
params = {'ef': 64}
default_index = {"index_type": index, "params": params, "metric_type": "COSINE"}

nb = 5000
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
data = [[i for i in range(nb)], [None for _ in range(nb)],[], vectors]
#  equals to data1 = [[1,2], [None,None],[None,None], vectors]
data1 = [[1,2], [],[], vectors]

collection.insert(data=data)
#collection.upsert(data=data1)
collection.create_index("float_vector", index, index_name="index_name_1")
collection.load()
collection.flush()
res = collection.num_entities
print(res)
default_search_params = {"metric_type": "", "params": {}}
limit = 1000
nq = 2
import time
start = time.time()
res1 = collection.search(vectors[:nq], "float_vector", default_search_params, limit, group_by_field="nullableFid", output_fields=["nullableFid", "int32"])
end = time.time() - start
print(res1)
print("search successfully in %f s" % end)

Milvus Log

https://grafana-4am.zilliz.cc/explore?orgId=1&panes=%7B%22Woc%22:%7B%22datasource%22:%22vhI6Vw67k%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22default-null-test-bgkie.%2A%5C%22%7D%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22vhI6Vw67k%22%7D%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1

Anything else?

No response

smellthemoon commented 1 month ago

by design.

xiaofan-luan commented 1 month ago

maybe we can improve this? null doesn't below to any group

smellthemoon commented 1 month ago

maybe we can improve this? null doesn't below to any group

yes, null will support it later. may be in 2.5.x?

stale[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.