milvus-io / pymilvus

Python SDK for Milvus.
Apache License 2.0
1k stars 323 forks source link

[Bug]: bulkwriter does not support add row for new json datatype #2080

Open zhuwenxing opened 5 months ago

zhuwenxing commented 5 months ago

Is there an existing issue for this?

Describe the bug

new json datatype can be int, float, varchar and array. but in bulkwriter will check data as key value pair.

[2024-05-10T09:38:20.579Z]         with RemoteBulkWriter(
[2024-05-10T09:38:20.579Z]             schema=schema,
[2024-05-10T09:38:20.579Z]             remote_path="bulk_data",
[2024-05-10T09:38:20.579Z]             connect_param=RemoteBulkWriter.ConnectParam(
[2024-05-10T09:38:20.579Z]                 bucket_name=self.bucket_name,
[2024-05-10T09:38:20.579Z]                 endpoint=self.minio_endpoint,
[2024-05-10T09:38:20.579Z]                 access_key="minioadmin",
[2024-05-10T09:38:20.579Z]                 secret_key="minioadmin",
[2024-05-10T09:38:20.579Z]             ),
[2024-05-10T09:38:20.579Z]             file_type=BulkFileType.NUMPY,
[2024-05-10T09:38:20.579Z]         ) as remote_writer:
[2024-05-10T09:38:20.579Z]             json_value = [
[2024-05-10T09:38:20.579Z]                 1,
[2024-05-10T09:38:20.579Z]                 1.0,
[2024-05-10T09:38:20.579Z]                 "1",
[2024-05-10T09:38:20.579Z]                 [1, 2, 3],
[2024-05-10T09:38:20.579Z]                 ["1", "2", "3"],
[2024-05-10T09:38:20.579Z]                 [1, 2, "3"],
[2024-05-10T09:38:20.579Z]                 {"key": "value"},
[2024-05-10T09:38:20.579Z]             ]
[2024-05-10T09:38:20.579Z]             for i in range(entities):
[2024-05-10T09:38:20.579Z]                 row = {
[2024-05-10T09:38:20.579Z]                     df.pk_field: i,
[2024-05-10T09:38:20.579Z]                     df.int_field: 1,
[2024-05-10T09:38:20.579Z]                     df.float_field: 1.0,
[2024-05-10T09:38:20.579Z]                     df.string_field: "string",
[2024-05-10T09:38:20.579Z]                     df.json_field: json_value[i%len(json_value)],
[2024-05-10T09:38:20.579Z]                     df.vec_field: cf.gen_vectors(1, dim)[0]
[2024-05-10T09:38:20.579Z]                 }
[2024-05-10T09:38:20.579Z]                 if auto_id:
[2024-05-10T09:38:20.579Z]                     row.pop(df.pk_field)
[2024-05-10T09:38:20.579Z]                 if enable_dynamic_field:
[2024-05-10T09:38:20.579Z]                     row["name"] = fake.name()
[2024-05-10T09:38:20.579Z]                     row["address"] = fake.address()
[2024-05-10T09:38:20.579Z] >               remote_writer.append_row(row)
[2024-05-10T09:38:20.579Z] 
[2024-05-10T09:38:20.579Z] /home/jenkins/agent/workspace/tests/python_client/testcases/test_bulk_insert.py:1121: 
[2024-05-10T09:38:20.579Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/remote_bulk_writer.py:259: in append_row
[2024-05-10T09:38:20.579Z]     super().append_row(row, **kwargs)
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/local_bulk_writer.py:90: in append_row
[2024-05-10T09:38:20.579Z]     super().append_row(row, **kwargs)
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/bulk_writer.py:89: in append_row
[2024-05-10T09:38:20.579Z]     self._verify_row(row)
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/bulk_writer.py:198: in _verify_row
[2024-05-10T09:38:20.579Z]     row[field.name], size = self._verify_json(row[field.name], field)
[2024-05-10T09:38:20.579Z] /usr/local/lib/python3.8/site-packages/pymilvus/bulk_writer/bulk_writer.py:137: in _verify_json
[2024-05-10T09:38:20.579Z]     self._throw(f"Illegal JSON value for field '{field.name}', type mismatch")
[2024-05-10T09:38:20.579Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2024-05-10T09:38:20.579Z] 
[2024-05-10T09:38:20.579Z] self = <pymilvus.bulk_writer.remote_bulk_writer.RemoteBulkWriter object at 0x7f3a40438160>
[2024-05-10T09:38:20.579Z] msg = "Illegal JSON value for field 'json', type mismatch"
[2024-05-10T09:38:20.579Z] 
[2024-05-10T09:38:20.579Z]     def _throw(self, msg: str):
[2024-05-10T09:38:20.579Z]         logger.error(msg)
[2024-05-10T09:38:20.579Z] >       raise MilvusException(message=msg)
[2024-05-10T09:38:20.579Z] E       pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Illegal JSON value for field 'json', type mismatch)>

Expected Behavior

new json datatype can be added

Steps/Code To Reproduce behavior

No response

Environment details

- Hardware/Softward conditions (OS, CPU, GPU, Memory):
- Method of installation (Docker, or from source):
- Milvus version (v0.3.1, or v0.4.0):
- Milvus configuration (Settings you made in `server_config.yaml`):

Anything else?

No response

zhuwenxing commented 5 months ago

/assign @yhmo