milvus-io / pymilvus

Python SDK for Milvus.
Apache License 2.0
998 stars 321 forks source link

[Bug]: MilvusClient isn't setting nlist while creating IVF_FLAT IP index #2009

Open afzalIbnSH opened 6 months ago

afzalIbnSH commented 6 months ago

Is there an existing issue for this?

Describe the bug

Below is my code

    fields = [
        FieldSchema(
            name=Columns.ID.value,
            dtype=DataType.VARCHAR,
            max_length=50,
            is_primary=True,
            auto_id=False,
        ),
        FieldSchema(name=Columns.IS_ENABLED.value, dtype=DataType.BOOL),
        FieldSchema(name=Columns.COUNTRY_CODES.value, dtype=DataType.JSON),
        FieldSchema(name=Columns.APPLICABLE_FOR.value, dtype=DataType.JSON),
        FieldSchema(name=Columns.EMBEDDING.value, dtype=DataType.FLOAT_VECTOR, dim=MODEL_DIM),
    ]
    schema = CollectionSchema(
        fields=fields, description="search recipes", enable_dynamic_field=True
    )

    index_params = MilvusClient.prepare_index_params()
    index_params.add_index(
        field_name=Columns.EMBEDDING.value,
        index_type=IndexType.IVF_FLAT,
        metric_type=MetricType.IP,
        params={"nlist": 1536}
    )

    client.create_collection(
        collection_name,
        id_type="string",
        vector_field_name=Columns.EMBEDDING.value,
        metric_type=MetricType.IP,
        schema=schema,
        index_params=index_params
    )

It does result in creation of an IVF_FLAT IP index but the nlist isn't getting set.

Screenshot 2024-03-29 at 2 51 39 AM

If I use the Collection class it works;

    collection = Collection(collection_name, schema=schema, using=client._using)

    index_params = {
        "metric_type": MetricType.IP,
        "index_type": IndexType.IVF_FLAT,
        "params": {"nlist": 1536},
    }

    collection.create_index(
        field_name=Columns.EMBEDDING.value, index_params=index_params
    )

But as you see I'm forced to use a private variable from the MilvusClient instance, _using. Please fix this.

Expected Behavior

The nlist should get set.

Screenshot 2024-03-29 at 2 52 36 AM

Steps/Code To Reproduce behavior

Use the code from my bug description.

Environment details

- Hardware/Softward conditions (OS, CPU, GPU, Memory): macOS Monterey (12.1)
- Method of installation (Docker, or from source): Zilliz cloud
- Milvus version (v0.3.1, or v0.4.0): Not sure. Zilliz says `Compatible With Milvus 2.3.x`. PyMilvus version is 2.4.0
- Milvus configuration (Settings you made in `server_config.yaml`): -

Anything else?

No response

afzalIbnSH commented 6 months ago

Can someone look at this?

czs007 commented 5 months ago

@afzalIbnSH sorry for the late response.

I have made some modifications based on your example to facilitate local execution. I was unable to reproduce the issue in version 2.3.7 and 2.4.0.

import time
import numpy as np

import time

import numpy as np
from pymilvus import (
    connections,
    utility,
    FieldSchema, CollectionSchema, DataType,
    MilvusClient,
)

dim = 8
collection_name = "hello_milvus"
milvus_client = MilvusClient("http://localhost:19530")

fields = [
    FieldSchema(
        name="pk",
        dtype=DataType.VARCHAR,
        max_length=50,
        is_primary=True,
        auto_id=False,
    ),
    FieldSchema(name="bool", dtype=DataType.BOOL),
    FieldSchema(name="codes", dtype=DataType.JSON),
    FieldSchema(name="json", dtype=DataType.JSON),
    FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(
    fields=fields, description="search recipes", enable_dynamic_field=True
)

index_params = milvus_client.prepare_index_params()
index_params.add_index(
    field_name="embeddings",
    index_type="IVF_FLAT",
    metric_type="IP",
    params={"nlist": 1536}
)

milvus_client.create_collection(
    collection_name,
    id_type="string",
    vector_field_name="embeddings",
    metric_type="IP",
    schema=schema,
    index_params=index_params
)

index_names = milvus_client.list_indexes(collection_name)
print(f"index names for {collection_name}:", index_names)
for index_name in index_names:
    index_info = milvus_client.describe_index(collection_name, index_name=index_name)
    print(f"index info for index {index_name} is:", index_info)

output:

index names for hello_milvus: ['embeddings']
index info for index embeddings is: {'index_type': 'IVF_FLAT', 'metric_type': 'IP', 'nlist': '1536', 'field_name': 'embeddings', 'index_name': 'embeddings'}
binbinlv commented 5 months ago

The return from the SDK side is OK , and I could reproduce this issue on Zilliz Cloud, so I think this is not a pymilvus issue。 A6Us9YdGZX

On zilliz cloud, this could be reproduced indeed:

Orm created collection and index: img_v3_02ac_17dbf716-5330-4c13-adf8-ed3f703bc9eg

Client created collection and index kAQQfIoz3o

binbinlv commented 5 months ago

So I think maybe need to file an issue on Zilliz cloud