[Bug]: Query error when there is no data

HantaoCai commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version: v2.2.11
- Deployment mode (standalone or cluster): cluster
- MQ type (rocksmq, pulsar or kafka) : pulsar
- SDK version(e.g. pymilvus v2.0.0 rc2): pymilvus 2.2.13
- OS (Ubuntu or CentOS) : k8s/centos
- CPU/Memory: 12/24
- GPU: None
- Others:

Current Behavior

first show my code:

def delete_file_embeddings(db_file: FileInfoModel):
    collection = get_collection()
    if collection.has_partition(str(db_file.partition_id)):
        try:
            expr = f"file_id == {db_file.file_id}"
            collection.load()
            res = collection.query(expr=expr, output_fields=['index_id'],
                                   partition_names=[str(db_file.partition_id)],
                                   consistency_level="Strong", timeout=None)
            del_expr = "index_id in [{}]".format(', '.join("'{}'".format(d['index_id']) for d in res))
            collection.delete(del_expr, partition_names=[str(db_file.partition_id)])
        except Exception as e:
            raise Exception("delete file embedding error: " + str(e))

the way i created:

def create_partition(partition_id: int):
    collection = get_collection()
    collection.create_partition(str(partition_id))

After I create a partition, no data has been inserted into it, and the query will report an error:

in delete_file_embeddings
    raise Exception("delete file embedding error: " + str(e))
Exception: delete file embedding error: <MilvusException: (code=1, message=fail to query on all shard leaders, err=All attempts results:
attempt #1:code: UnexpectedError, error: fail to Query, QueryNode ID = 663, reason=stream operation failed: PartitionNotFound(partitionID=442933696221561024)
attempt #2:context canceled

From the perspective of the error, it implies that I don't have this partition. However, in reality, it does exist but contains no data. Therefore, my method adds a check of whether the partition exists. After I manually insert the data, the query will be normal.

Expected Behavior

return an empty array instead of throwing an error

xiaofan-luan commented 1 year ago

/assign @HantaoCai as the error said, partition not found I guess you don't do collection or partition load

HantaoCai commented 1 year ago

/assign @HantaoCai as the error said, partition not found I guess you don't do collection or partition load

I added the existence check and loaded the collection data

 if collection.has_partition(str(db_file.partition_id)):
      .......
      collection.load()

This error occurs in this method -> delete_file_embeddings

yanliang567 commented 1 year ago

sounds like a dup issue to #26113, could you please try collection.flush() before collection.load()? @HantaoCai

xiaofan-luan commented 1 year ago

def delete_file_embeddings(db_file: FileInfoModel):
    collection = get_collection()
    if collection.has_partition(str(db_file.partition_id)):
        try:
            expr = f"file_id == {db_file.file_id}"
            collection.load()
            res = collection.query(expr=expr, output_fields=['index_id'],
                                   partition_names=[str(db_file.partition_id)],
                                   consistency_level="Strong", timeout=None)
            del_expr = "index_id in [{}]".format(', '.join("'{}'".format(d['index_id']) for d in res))
            collection.delete(del_expr, partition_names=[str(db_file.partition_id)])
        except Exception as e:
            raise Exception("delete file embedding error: " + str(e))

I don't thinks this has any to do with flush. we only support delete with primary key. is index_id the primary key?

xiaofan-luan commented 1 year ago

sounds like a dup issue to #26113, could you please try collection.flush() before collection.load()? @HantaoCai

26113 I guess is not a issue. we never support create , load and create another partition in milvus 2.x

HantaoCai commented 1 year ago

def delete_file_embeddings(db_file: FileInfoModel):
    collection = get_collection()
    if collection.has_partition(str(db_file.partition_id)):
        try:
            expr = f"file_id == {db_file.file_id}"
            collection.load()
            res = collection.query(expr=expr, output_fields=['index_id'],
                                   partition_names=[str(db_file.partition_id)],
                                   consistency_level="Strong", timeout=None)
            del_expr = "index_id in [{}]".format(', '.join("'{}'".format(d['index_id']) for d in res))
            collection.delete(del_expr, partition_names=[str(db_file.partition_id)])
        except Exception as e:
            raise Exception("delete file embedding error: " + str(e))

I don't thinks this has any to do with flush. we only support delete with primary key. is index_id the primary key?

Yes, it is.

xiaofan-luan commented 1 year ago

def delete_file_embeddings(db_file: FileInfoModel):
    collection = get_collection()
    if collection.has_partition(str(db_file.partition_id)):
        try:
            expr = f"file_id == {db_file.file_id}"
            collection.load()
            res = collection.query(expr=expr, output_fields=['index_id'],
                                   partition_names=[str(db_file.partition_id)],
                                   consistency_level="Strong", timeout=None)
            del_expr = "index_id in [{}]".format(', '.join("'{}'".format(d['index_id']) for d in res))
            collection.delete(del_expr, partition_names=[str(db_file.partition_id)])
        except Exception as e:
            raise Exception("delete file embedding error: " + str(e))

I don't thinks this has any to do with flush. we only support delete with primary key. is index_id the primary key?

Yes, it is.

what't the current error after you load?

HantaoCai commented 1 year ago

It can operate normally when adding partitions to existing collections, however, exceptions will occur when querying partitions with non-existing data.

I did not modify my code, execute to

res = collection.query(expr=expr, output_fields=['index_id'],
                                    partition_names=[str(db_file.partition_id)],
                                    consistency_level="Strong", timeout=None)

This line will report an error. Same as I mentioned above.

xiaofan-luan commented 1 year ago

db_file.partition_id)

which mean the partition is not loaded.

To be noticed you on 2.2 load is not idempotent. try to release and load again, or use refresh in load

HantaoCai commented 1 year ago

I attempted to reload the partition, but it returned an error stating that the collection has already been loaded.

xiaofan-luan commented 1 year ago

that's why you need to release and then reload.

For now milvus does not support load collection create partition and search.

to reload the partition you can first do a release or you could load partition with a param refresh

HantaoCai commented 1 year ago

Such operation will result in service unavailability during the process. So, when will you be able to support loading partitions when the collection has already been loaded? Our requirements are as follows: I have numerous user data and we want to isolate them as much as possible, which could enhance the retrieval efficiency. Our initial approach was to put each user's data in their own partition, but we are currently facing some issues. Do you have any suggestions?

xiaofan-luan commented 1 year ago

partition_key features is probably what you are looking for, unless you have very limited partitions（Say less than 500） and dynamic partition load will be supported on 2.3

HantaoCai commented 1 year ago

Regarding this issue, we are currently storing the partition_id as a business field in a collection, filtering large amounts of data within the same collection. When do you anticipate the 2.3 version will be launched? If by then we need to remove this business field and use partition, is there any data migration tool that could facilitate our transition seamlessly?

xiaofan-luan commented 1 year ago

Regarding this issue, we are currently storing the partition_id as a business field in a collection, filtering large amounts of data within the same collection. When do you anticipate the 2.3 version will be launched? If by then we need to remove this business field and use partition, is there any data migration tool that could facilitate our transition seamlessly?

Milvus 2.3 will be released next week

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

milvus-io / milvus