milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.44k stars 2.82k forks source link

[Bug]: [null & default] Search/query returns "None" rather than 10.0 when inserted "None" to the field with both "nullable=True" and "default_value=10.0" set #36003

Open binbinlv opened 1 week ago

binbinlv commented 1 week ago

Is there an existing issue for this?

Environment

- Milvus version: master-20240904-a32f337e
- Deployment mode(standalone or cluster): both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.0rc74
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Search/query returns "None" rather than 10.0 when inserted "None" to the field with "nullable=True" and "default_value=10.0"

search results:

data: ['["id: 0, distance: 0.0, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}", "id: 484, distance: 14.754112243652344, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}", "id: 461, distance: 14.80323600769043, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}", "id: 491, distance: 15.442129135131836, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}", "id: 944, distance: 15.526017189025879, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}", "id: 611, distance: 15.62874984741211, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}", "id: 610, distance: 15.764344215393066, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}", "id: 780, distance: 16.080350875854492, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}", "id: 182, distance: 16.292282104492188, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}", "id: 602, distance: 16.308732986450195, entity: {\'nullableFid\': None, \'int32\': 10, \'string\': \'10\'}"]']

query results:

data: ["{'int64': 0, 'nullableFid': None, 'int32': 10, 'string': '10'}", "{'int64': 1, 'nullableFid': None, 'int32': 10, 'string': '10'}", "{'int64': 2, 'nullableFid': None, 'int32': 10, 'string': '10'}", "{'int64': 3, 'nullableFid': None, 'int32': 10, 'string': '10'}", "{'int64': 4, 'nullableFid': None, 'int32': 10, 'string': '10'}", "{'int64': 5, 'nullableFid': None, 'int32': 10, 'string': '10'}", "{'int64': 6, 'nullableFid': None, 'int32': 10, 'string': '10'}", "{'int64': 7, 'nullableFid': None, 'int32': 10, 'string': '10'}", "{'int64': 8, 'nullableFid': None, 'int32': 10, 'string': '10'}", "{'int64': 9, 'nullableFid': None, 'int32': 10, 'string': '10'}"]

Expected Behavior

Search/query returns the default value if default value is set and insert "None"

Steps To Reproduce

from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility
import json
import random

connections.connect()

dim = 128
int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
double_field = FieldSchema(name="nullableFid", dtype=DataType.DOUBLE, nullable=True, default_value=10.0, is_primary=False)
int32_field = FieldSchema(name="int32", dtype=DataType.INT64, default_value=10)
string_field = FieldSchema(name="string", dtype=DataType.VARCHAR, max_length=1000, default_value="10")
float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim, mmap_enabled=False)
schema = CollectionSchema(fields=[int64_field, double_field, int32_field, string_field, float_vector], enable_dynamic_field=False)
utility.drop_collection("test")
collection = Collection("test", schema=schema)
res = collection.schema
print(res)

index = {"index_type": "DISKANN", "metric_type": "L2", "params": {}}

nb = 1000
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]

data = [[i for i in range(nb)], [None for _ in range(nb)],[None for _ in range(nb)], [], vectors]
#  equals to data1 = [[1,2], [None,None],[None,None], vectors]
data1 = [[i for i in range(nb)], [],[],[], vectors]

collection.insert(data=data)
collection.create_index("float_vector", index, index_name="index_name_1")
collection.load()
collection.flush()
res = collection.num_entities
print(res)
default_search_params = {"metric_type": "L2", "params": {}}
limit = 10
nq = 1
import time
start = time.time()
res1 = collection.search(vectors[:nq], "float_vector", default_search_params, limit, "int64 >= 0", output_fields=["nullableFid", "int32", "string"])
end = time.time() - start
print(res1)
print("search successfully in %f s" % end)
start = time.time()
res = collection.query("int64>=0", output_fields=["nullableFid","int32", "string"])
end = time.time() - start
print(res)
print("query successfully in %f s" % end)

Milvus Log

https://grafana-4am.zilliz.cc/explore?orgId=1&panes=%7B%22nkB%22:%7B%22datasource%22:%22vhI6Vw67k%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22default-null-test-edxnm.%2A%5C%22%7D%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22vhI6Vw67k%22%7D%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1

Anything else?

No response

binbinlv commented 1 week ago

Search after upsert has the same issue too.