Closed NasonZ closed 1 year ago
I updated the code as below, and it works well:
import numpy as np
#dummy_data
from pymilvus import connections
connections.connect(
alias="default",
user='username',
password='password',
host='1xx.xx.x.x',
port='19530'
)
from pymilvus import Collection, DataType, FieldSchema, CollectionSchema, connections
#create fields
_id = FieldSchema(
name="id",
dtype=DataType.INT64,
is_primary=True,
)
page_title = FieldSchema(
name="title",
dtype=DataType.VARCHAR, #STRING gives SchemaNotReadyException error
max_length=200
)
page_url = FieldSchema(
name="url",
dtype=DataType.VARCHAR,
max_length=200
)
snippets = FieldSchema(name="snippets",
dtype=DataType.VARCHAR,
max_length=200
)
embedding = FieldSchema(name="embedding",
dtype=DataType.FLOAT_VECTOR,
dim=768)
# Create the collection schema
schema = CollectionSchema(
fields=[_id, page_title, page_url, snippets, embedding], #[_id, page_title, page_url, p_type, app, id_in_app, parent_id_in_app, date_created, date_last_edit, snippets, embedding],[page_title, page_url, embedding]
description="Dummy Dataset Collectio",
enable_dynamic_field=True
)
# Create the collection
collection_name = "dummy_dataset"
#create collection
dummy_collection = Collection(
name=collection_name,
schema=schema,
using='default',
shards_num=2
)
dummy_data2 = \
[
[1, 2, 3],
['Varied jokes', 'Varied jokes', 'General jokes'],
['https://jokesRus.com', 'https://jokesRus.com', 'https://Fuknee.com'],
["Why don't scientists trust atoms?", "I'm reading a book about anti-gravity.","Did you hear about the mathematician"],
[np.random.rand(768).tolist(), np.random.rand(768).tolist(), np.random.rand(768).tolist()]
]
inserted_ids = dummy_collection.insert(dummy_data2)
print(inserted_ids)
/assign @NasonZ /unassign
Thanks for the response. I want to embed each snippet within a json and store it in Milvus along with its metadata. Just so I know I'm on the right path, based on your example the correct way to do this would be:
Source:
{
"title": "Varied jokes",
"url": "https://jokesRus.com",
"type": "page",
"app": "safari",
"id_in_app": "14680104",
"parent_id_in_app": "Root",
"date_created": "2023-03-18T18:57:49.635Z",
"date_last_edit": "2023-03-20T17:48:29.821Z",
"snippets": [
{
"topic": "-",
"content": "Why don't scientists trust atoms? Because they make up everything!",
"references": []
},
{
"topic": "-",
"content": "I'm reading a book about anti-gravity. It's impossible to put down!",
"references": []
},
{
"topic": "-",
"content": "Why don't skeletons fight each other? They don't have the guts!",
"references": []
},
{
"topic": "-",
"content": "Why did the scarecrow win an award? Because he was outstanding in his field!",
"references": []
},
{
"topic": "-",
"content": "Did you hear about the mathematician who's afraid of negative numbers? He'll stop at nothing to avoid them!",
"references": []
},...
Milvus entry:
[
[1, 2, 3,4,5], # unique ID for each snippet
['Varied jokes', 'Varied jokes', 'Varied jokes', 'Varied jokes', 'Varied jokes'], # metadata from the source json is repeated n number of snippets
['https://jokesRus.com', 'https://jokesRus.com', 'https://jokesRus.com', 'https://jokesRus.com', 'https://jokesRus.com'],
["Why don't scientists trust atoms?", "I'm reading a book about anti-gravity.","Why don't skeletons fight each other?", "Why did the scarecrow win an award?", "Did you hear about the mathematician"],
[np.random.rand(768).tolist(), np.random.rand(768).tolist(), np.random.rand(768).tolist(), np.random.rand(768).tolist(), np.random.rand(768).tolist()]
]
I want to be sure I understand how to insert document chunks and their metadata.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
When I try to add data to my collection I get this error:
RPC error: [insert_rows], <DataNotMatchException: (code=1, message=The data in the same column must be of the same type.)>
Expected Behavior
I expect my data to be added to my collection as shown in documentation - https://milvus.io/docs/insert_data.md
Steps To Reproduce
RPC error: [insert_rows], <DataNotMatchException: (code=1, message=The data in the same column must be of the same type.)>, <Time:{'RPC start': '2023-07-25 21:13:58.810130', 'RPC error': '2023-07-25 21:13:58.810288'}> RPC error: [insert_rows], <DataNotMatchException: (code=1, message=The data in the same column must be of the same type.)>, <Time:{'RPC start': '2023-07-25 21:13:58.811117', 'RPC error': '2023-07-25 21:13:58.811193'}> RPC error: [insert_rows], <DataNotMatchException: (code=1, message=The data in the same column must be of the same type.)>, <Time:{'RPC start': '2023-07-25 21:13:58.811776', 'RPC error': '2023-07-25 21:13:58.811878'}> RPC error: [insert_rows], <DataNotMatchException: (code=1, message=The data in the same column must be of the same type.)>, <Time:{'RPC start': '2023-07-25 21:13:58.812349', 'RPC error': '2023-07-25 21:13:58.813201'}>