milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.27k stars 2.91k forks source link

[Bug]: [null & default] Search with expression "nullableFid == 0" on nullable field should not filter "None" #36124

Closed binbinlv closed 1 week ago

binbinlv commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version:master-20240908-208c8a23
- Deployment mode(standalone or cluster): both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2):2.5.0rc78
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Search with expression "nullableFid == 0" on nullable field could not filter "None"

data: ['["id: 1, distance: 0.0, entity: {\'nullableFid\': None, \'int32\': 10}", "id: 2, distance: 18.35403823852539, entity: {\'nullableFid\': None, \'int32\': 10}"]']
search successfully in 0.036342 s

Expected Behavior

No "None" data is searched using expression "nullableFid == 0"

Steps To Reproduce

from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility
import json
import random

connections.connect()

dim = 128
int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
double_field = FieldSchema(name="nullableFid", dtype=DataType.DOUBLE, nullable=True)
int32_field = FieldSchema(name="int32", dtype=DataType.INT64, default_value=10)
float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim)
schema = CollectionSchema(fields=[int64_field, double_field, int32_field,float_vector])
utility.drop_collection("test")
collection = Collection("test", schema=schema)
res = collection.schema
print(res)

index = {"index_type": "DISKANN", "metric_type": "L2", "params": {}}

nb = 2
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
data = [[1,2], [3,None],[4,None], vectors]
#  equals to data1 = [[1,2], [None,None],[None,None], vectors]
data1 = [[1,2], [],[], vectors]

collection.insert(data=data)
#collection.upsert(data=data1)
collection.create_index("float_vector", index, index_name="index_name_1")
collection.load()
collection.flush()
res = collection.num_entities
print(res)
default_search_params = {"metric_type": "L2", "params": {}}
limit = 10
nq = 1
import time
start = time.time()
res1 = collection.search(vectors[:nq], "float_vector", default_search_params, limit, "nullableFid == 0", output_fields=["nullableFid", "int32"])
end = time.time() - start
print(res1)
print("search successfully in %f s" % end)

Milvus Log

https://grafana-4am.zilliz.cc/explore?orgId=1&left=%7B%22datasource%22:%22Loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22default-null-test-skmkb.*%5C%22%7D%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D

Anything else?

No response

binbinlv commented 1 month ago

it is right when "nullableFid == 1": None data is not searched.

data: ['[]']
search successfully in 0.035838 s
smellthemoon commented 1 month ago

Expr in null value, waiting #35527

binbinlv commented 1 week ago

Verified and fixed:

milvus: master-20241021-70339820-amd64 pymilvus: 2.5.0rc99

results:

>>> res1 = collection.search(vectors[:nq], "float_vector", default_search_params, limit, "nullableFid == 0", output_fields=["nullableFid", "int32"])
>>> print(res1)
data: ['[]']
>>>