milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.36k stars 2.91k forks source link

[Bug]: [null & default] Insert do not report an error when json field nullable is false and json is None #36354

Closed qixuan0212 closed 1 month ago

qixuan0212 commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version: master-20240916-dcd904d2-amd64
- Deployment mode(standalone or cluster): both
- MQ type(rocksmq, pulsar or kafka): all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.0rc78
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

When json field nullable is false and json is None, insert success

Expected Behavior

When json field nullable is false and json is None, error should be reported.

Steps To Reproduce

import array

import numpy as np
from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility
import json
import random

connections.connect()

dim, nb = 128, 100
int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
double_field = FieldSchema(name="nullableFid", dtype=DataType.DOUBLE, nullable=True, default_value=10.0, is_primary=False)
int32_field = FieldSchema(name="int32", dtype=DataType.INT64, default_value=10)
string_field = FieldSchema(name="string", dtype=DataType.VARCHAR, max_length=1000, nullable=True)
float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim, mmap_enabled=False)
# array_field = FieldSchema(name="array", dtype=DataType.ARRAY, element_type=DataType.VARCHAR, max_length=nb,
#                           max_capacity=nb, nullable=True)
json_field = FieldSchema(name="json", dtype=DataType.JSON, nullable=False)
schema = CollectionSchema(fields=[int64_field, double_field, int32_field, string_field, json_field, float_vector], enable_dynamic_field=True)
utility.drop_collection("test")
collection = Collection("test", schema=schema)
res = collection.schema
print(res)
index = {"index_type": "DISKANN", "metric_type": "L2", "params": {}}
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
# array_field = None
json_field = [None for _ in range(nb)]
data = [[i for i in range(nb)], [None for _ in range(nb)], [None for _ in range(nb)], [None for _ in range(nb)],
        json_field, vectors]

collection.insert(data=data)
collection.create_index("float_vector", index, index_name="index_name_1")
collection.load()
collection.flush()
res = collection.num_entities
print(res)
default_search_params = {"metric_type": "L2", "params": {}}
limit = 10
nq = 1
import time
start = time.time()
res1 = collection.search(vectors[:nq], "float_vector", default_search_params, limit, "int64 >= 0", output_fields=["nullableFid", "int32", "string", "json"])
end = time.time() - start
print(res1)
print("search successfully in %f s" % end)
# start = time.time()
# res = collection.query("int64>=0", output_fields=["nullableFid","int32", "string"])
# end = time.time() - start
# print(res)
# print("query successfully in %f s" % end)

Milvus Log

https://grafana-4am.zilliz.cc/explore?orgId=1&panes=%7B%22XDE%22:%7B%22datasource%22:%22vhI6Vw67k%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22default-none-qx-pgffh.%2A%5C%22%7D%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22vhI6Vw67k%22%7D%7D%5D,%22range%22:%7B%22from%22:%221726713986504%22,%22to%22:%221726713997554%22%7D%7D%7D&schemaVersion=1

Anything else?

After insert success, json field in search result is None.

binbinlv commented 1 month ago

/assign @smellthemoon /unassign @yanliang567

smellthemoon commented 1 month ago

pr merged. could you plz help to check it? @qixuan0212

smellthemoon commented 1 month ago

/assign @qixuan0212

qixuan0212 commented 1 month ago

Verified and fixed:

pymilvus:2.5.0rc89 milvus: master-20241010-290ceb4e-amd64

results: RPC error: [batch_insert], <ParamError: (code=1, message=field (json) expect not None input)>, <Time:{'RPC start': '2024-10-11 15:37:40.184739', 'RPC error': '2024-10-11 15:37:40.609634'}>