milvus-io / pymilvus

Python SDK for Milvus.
Apache License 2.0
1.04k stars 330 forks source link

[Enhancement]: Add a property to ExtraList for extracting data from result of search, query, get,... function of MilvusClient class #2362

Open CaoHaiNam opened 2 days ago

CaoHaiNam commented 2 days ago

Is there an existing issue for this?

What would you like to be added?

I would like a property to be added to the ExtraList class that allows users to extract its data. This property should make it easy to use the underlying data without any additional structure or formatting.

Why is this needed?

Currently, when using the latest master branch version, hybrid_search, search, query and get functions of MilvusClient class return an ExtraList object. While this is helpful for additional metadata, it complicates situations where the underlying data can not be accessed directly for further processing or calculations.

Adding this feature will improve usability and make the class more intuitive for users who need direct access to the data.

Anything else?

Here’s an example of the current issue when using ExtraList with the Milvus query functionality:

from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, MilvusClient
import random

connections.connect("default", host="localhost", port="19530")
milvus_client = MilvusClient("http://localhost:19530")

# Define schema and collection
schema = CollectionSchema([
    FieldSchema("film_id", DataType.INT64, is_primary=True),
    FieldSchema("film_date", DataType.INT64),
    FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
])
collection = Collection("test_collection_query", schema)

# Insert sample data
data = [
    [i for i in range(10)],
    [i + 2000 for i in range(10)],
    [[random.random() for _ in range(2)] for _ in range(10)],
]
collection.insert(data)

# Create index and load collection
index_param = {"index_type": "FLAT", "metric_type": "L2", "params": {}}
collection.create_index("films", index_param)
collection.load()

# Perform query
expr = "film_id == 3"

res = milvus_client.query(
    collection_name='test_collection_query',
    filter=expr,
    output_fields=["film_date"]
)

# Current behavior
print(res)  
# Output: data: ["{'film_id': 3, 'film_date': 2003}"], cannot access data directly

# Desired behavior with a new property
print(res.data)  
# Output: [{'film_id': 3, 'film_date': 2003}], a plain list object accessible for further calculations.

In the example above, when querying the Milvus collection, it returns an ExtraList. This structure makes it difficult to directly extract the query result for further processing. Adding a .data property or a similar method would allow users to directly access the query results, making them easier to work with for calculations, transformations, or other downstream processes.

CaoHaiNam commented 2 days ago

/assign @CaoHaiNam