milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.49k stars 2.92k forks source link

[Bug]: dynamic schema insert data failed cause raw data format #25414

Closed TANGnlp0711 closed 1 year ago

TANGnlp0711 commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.9
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): 2.2.7
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Traceback (most recent call last): File "create_collection.py", line 106, in collection.insert(data_rows) File "/opt/conda/envs/transnet/lib/python3.7/site-packages/pymilvus/orm/collection.py", line 426, in ins check_insert_data_schema(self._schema, data) File "/opt/conda/envs/transnet/lib/python3.7/site-packages/pymilvus/orm/schema.py", line 312, in check_i infer_fields = parse_fields_from_data(data) File "/opt/conda/envs/transnet/lib/python3.7/site-packages/pymilvus/orm/schema.py", line 343, in parse_f fields = [FieldSchema("", infer_dtype_bydata(d[0])) for d in data] File "/opt/conda/envs/transnet/lib/python3.7/site-packages/pymilvus/orm/schema.py", line 343, in <listco fields = [FieldSchema("", infer_dtype_bydata(d[0])) for d in data] KeyError: 0

import pymysql as mysql from pymilvus import connections, utility from pymilvus import CollectionSchema, FieldSchema, DataType, Collection, Partition import sys sys.path.append('..') from configs.my_conf import mysql_open, mysql_close, milvus_open

COLLECTION_NAME = 'material_clips_pre_processed_test'

milvus_open('aigc-milvus')

collection = Collection(COLLECTION_NAME)

db_id = FieldSchema( name='id', dtype=DataType.VARCHAR, max_length=19, is_primary=True, description='雪花算法预生成ID,同关系数据库 .id', # 19位字符 ) project_id = FieldSchema( name='proj_id', dtype=DataType.VARCHAR, max_length=64, description='项目id', ) file_hash = FieldSchema( name='origin_file_hash', dtype=DataType.VARCHAR, max_length=64, description='文件内容hash(sha256)', # 64位字符 ) video_clip_path = FieldSchema( name='video_clip_path', dtype=DataType.VARCHAR, max_length=1000, description='视频切片地址(CFS path或COS url)', ) image_path = FieldSchema( name='image_path', dtype=DataType.VARCHAR, max_length=1000, description='图片地址(CFS path或COS url)', ) create_time = FieldSchema( name='create_time', dtype=DataType.VARCHAR, max_length=26, description='同关系数据库 .create_time', # 26位字符,1970-01-01 00:00:00.000000 ) image_features = FieldSchema( name='image_features', dtype=DataType.FLOAT_VECTOR, dim=512, description='图片向量', ) schema = CollectionSchema( fields=[ db_id, project_id, file_hash, video_clip_path, image_path, create_time, image_features, ], enable_dynamic_field=True, description=f'素材预处理后切片信息,基本同关系数据库({COLLECTION_NAME})', )

collection = Collection( name=COLLECTION_NAME, schema=schema, using='default', shards_num=2, ) collection = Collection("medium_articles_with_dynamic", schema) collection.create_index( field_name='image_features', index_params={ 'metric_type': 'L2', 'index_type': 'IVF_FLAT', 'params': { }, }, )

m_r = utility.index_building_progress(COLLECTION_NAME)

collection.load() data_rows=[{ 'title': '321', 'image_features': [0.041732933, 0.013779674, -0.027564144, ..., 0.030096486], 'link': 'https://medium.com/swlh/the-reported-mortality-rate-of-coronavirus-is-not-important-369989c8d912', 'reading_time': 13, 'publication': 'The Startup', 'claps': 1100, 'responses': 18 },{ 'title': '123', 'image_features': [0.041732933, 0.013779674, -0.027564144, ..., 0.030096486], 'link': 'https://medium.com/swlh/the-reported-mortality-rate-of-coronavirus-is-not-important-369989c8d912', 'reading_time': 13, 'publication': 'The Startup', 'claps': 1100, 'responses': 18 }] collection.insert(data_rows) collection.flush()

print("Entity counts: ", collection.num_entities)

ERROR data_row format only support [[]],,,,maybe, I got a wrong version for milvus and py milvus

milvus :2.2.9

xiaofan-luan commented 1 year ago

could you share your code?

yanliang567 commented 1 year ago

/assign @TANGnlp0711

TANGnlp0711 commented 1 year ago

could you share your code?

import pymysql as mysql from pymilvus import connections, utility from pymilvus import CollectionSchema, FieldSchema, DataType, Collection, Partition import sys sys.path.append('..') from configs.my_conf import mysql_open, mysql_close, milvus_open

COLLECTION_NAME = 'material_clips_pre_processed_test'

milvus_open('aigc-milvus')

collection = Collection(COLLECTION_NAME)

db_id = FieldSchema( name='id', dtype=DataType.VARCHAR, max_length=19, is_primary=True, description='雪花算法预生成ID,同关系数据库 .id', # 19位字符 ) project_id = FieldSchema( name='proj_id', dtype=DataType.VARCHAR, max_length=64, description='项目id', ) file_hash = FieldSchema( name='origin_file_hash', dtype=DataType.VARCHAR, max_length=64, description='文件内容hash(sha256)', # 64位字符 ) video_clip_path = FieldSchema( name='video_clip_path', dtype=DataType.VARCHAR, max_length=1000, description='视频切片地址(CFS path或COS url)', ) image_path = FieldSchema( name='image_path', dtype=DataType.VARCHAR, max_length=1000, description='图片地址(CFS path或COS url)', ) create_time = FieldSchema( name='create_time', dtype=DataType.VARCHAR, max_length=26, description='同关系数据库 .create_time', # 26位字符,1970-01-01 00:00:00.000000 ) image_features = FieldSchema( name='image_features', dtype=DataType.FLOAT_VECTOR, dim=512, description='图片向量', ) schema = CollectionSchema( fields=[ db_id, project_id, file_hash, video_clip_path, image_path, create_time, image_features, ], enable_dynamic_field=True, description=f'素材预处理后切片信息,基本同关系数据库({COLLECTION_NAME})', )

collection = Collection( name=COLLECTION_NAME, schema=schema, using='default', shards_num=2, ) collection = Collection("medium_articles_with_dynamic", schema) collection.create_index( field_name='image_features', index_params={ 'metric_type': 'L2', 'index_type': 'IVF_FLAT', 'params': { }, }, )

m_r = utility.index_building_progress(COLLECTION_NAME)

collection.load() data_rows=[{ 'title': '321', 'image_features': [0.041732933, 0.013779674, -0.027564144, ..., 0.030096486], 'link': 'https://medium.com/swlh/the-reported-mortality-rate-of-coronavirus-is-not-important-369989c8d912', 'reading_time': 13, 'publication': 'The Startup', 'claps': 1100, 'responses': 18 },{ 'title': '123', 'image_features': [0.041732933, 0.013779674, -0.027564144, ..., 0.030096486], 'link': 'https://medium.com/swlh/the-reported-mortality-rate-of-coronavirus-is-not-important-369989c8d912', 'reading_time': 13, 'publication': 'The Startup', 'claps': 1100, 'responses': 18 }] collection.insert(data_rows) collection.flush()

print("Entity counts: ", collection.num_entities)

ERROR data_row format only support [[]],,,,maybe, I got a wrong version for milvus and py milvus

milvus :2.2.9 pymilvus: 2.2.7

yanliang567 commented 1 year ago

/assign @NicoYuan1986 could you please help to reproduce this issue as the code snippet above?

NicoYuan1986 commented 1 year ago

@yanliang567 reproduced. but my error message is a little different. milvus version: v2.2.11 pymilvus version: 2.2.13.dev3

error message:

>>> collection.insert(data_rows)
RPC error: [insert_rows], <ParamError: (code=1, message=Field id don't match in entities[0])>, <Time:{'RPC start': '2023-07-12 14:35:41.247933', 'RPC error': '2023-07-12 14:35:41.248637'}>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 448, in insert
    res = conn.insert_rows(self._name, data, partition_name,
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 109, in handler
    raise e
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 105, in handler
    return func(*args, **kwargs)
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 136, in handler
    ret = func(self, *args, **kwargs)
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 85, in handler
    raise e
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 50, in handler
    return func(self, *args, **kwargs)
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 447, in insert_rows
    raise err
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 436, in insert_rows
    request = self._prepare_row_insert_or_upsert_request(
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 407, in _prepare_row_insert_or_upsert_request
    request = Prepare.row_insert_or_upsert_param(collection_name, rows, partition_name, fields_info, is_insert,
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 295, in row_insert_or_upsert_param
    _, _, auto_id_loc = traverse_rows_info(fields_info, entities)
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/utils.py", line 195, in traverse_rows_info
    raise ParamError(
pymilvus.exceptions.ParamError: <ParamError: (code=1, message=Field id don't match in entities[0])>

After i add pk field 'id', it still reported error:

>>> collection.insert(data_rows)
RPC error: [insert_rows], <ParamError: (code=1, message=Field proj_id don't match in entities[0])>, <Time:{'RPC start': '2023-07-12 14:34:54.130080', 'RPC error': '2023-07-12 14:34:54.131136'}>
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 448, in insert
    res = conn.insert_rows(self._name, data, partition_name,
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 109, in handler
    raise e
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 105, in handler
    return func(*args, **kwargs)
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 136, in handler
    ret = func(self, *args, **kwargs)
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 85, in handler
    raise e
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/decorators.py", line 50, in handler
    return func(self, *args, **kwargs)
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 447, in insert_rows
    raise err
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 436, in insert_rows
    request = self._prepare_row_insert_or_upsert_request(
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 407, in _prepare_row_insert_or_upsert_request
    request = Prepare.row_insert_or_upsert_param(collection_name, rows, partition_name, fields_info, is_insert,
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/prepare.py", line 295, in row_insert_or_upsert_param
    _, _, auto_id_loc = traverse_rows_info(fields_info, entities)
  File "/Users/zilliz/virtual-environment/milvus/lib/python3.10/site-packages/pymilvus/client/utils.py", line 195, in traverse_rows_info
    raise ParamError(
pymilvus.exceptions.ParamError: <ParamError: (code=1, message=Field proj_id don't match in entities[0])>

pymilvus 2.2.13 behaves the same.

xiaofan-luan commented 1 year ago

/assign @czs007

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.