milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.15k stars 2.8k forks source link

[Bug]: Delays in query post partition loading #34234

Closed nairan-deshaw closed 6 days ago

nairan-deshaw commented 2 months ago

Is there an existing issue for this?

Environment

- Milvus version: 2.3.12
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2): v2.4.0
- OS(Ubuntu or CentOS): RHEL8
- CPU/Memory: 512/720G 
- GPU: NA
- Others:

Current Behavior

When a partition is loaded the query to the partition takes a few seconds before results are returned. This happens when some partitions are already loaded into memory. The delay between loading finishing and availability of results is random. Adding a reproducer code:

for partition in ['partition_1', 'partition_2', 'partition_3']:
        load_time = time.time()
        num_failures = 0
        client.load_partitions("example_collection", [partition])
        while True:
            # results are expected here given that vector ID 5 is certainly present
            ret = client.get("example_collection", 5, output_fields=["vector"], partition_names=[partition], timeout = 5.0)
            if len(ret):
                print(f"After {num_failures} failures ({time.time() - load_time} seconds), got non-empty result.")
                break
            num_failures += 1

If the partition is released after the get method is completed, the results are available immediately.

Expected Behavior

The vectors should become available immediately after the partition load is completed.

Steps To Reproduce

1. Create a collection with a few partitions
2. Run the reproducer code

Milvus Log

No response

Anything else?

We checked logs on the K8s ingress layer as well as the pod logs and both of seem to be normal. If there are any particular logs that need to be checked, please let us know.

yanliang567 commented 2 months ago

@nairan-deshaw I think this was fixed by https://github.com/milvus-io/milvus/pull/34026, could you please upgrade to milvus 2.3.18 or 2.4.5 and retry? /assign @nairan-deshaw /unassign

nairan-deshaw commented 2 months ago

Hi @yanliang567, we upgraded the cluster to 2.4.5 but this issue still persists. Partitions with large number of entries (over 1k) show this behaviour.

yanliang567 commented 2 months ago

/assign @congqixia please help to take a look

yanliang567 commented 2 months ago

I think i know the season: it is that Milvus only has 2 statuses: loaded and not_loaded. when the first partition is loaded, milvus will return immediately if calling to load the next partition. that's why this issue does not reporduce to the first partition, but the next ones. The same reason why it not reproduce if you call release partition in every loop. I think this is kind of by design @nairan-deshaw

congqixia commented 2 months ago

@nairan-deshaw thanks for the feedback. There was a bug that newly loaded partition, of which collection or some other partitions is loaded before, is not guaranteed to be seen after being loaded. Patch is merged into 2.3&2.4 branch, you could wait for latest release. FYI.

nairan-deshaw commented 1 month ago

@congqixia I do see that the bug is addressed in 2.3.19 release. Is it also addressed in the 2.4.6 release?

congqixia commented 1 month ago

@congqixia I do see that the bug is addressed in 2.3.19 release. Is it also addressed in the 2.4.6 release?

@nairan-deshaw just checked the commits in v2.4.6, this issue is not addressed. Since v2.4.6 is a hotfix for some other crucial bugs. It shall be released with v2.4.7. FYI.

stale[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

nairan-deshaw commented 6 days ago

This has been fixed from 2.4.7 onwards. Thanks for the help!

Closing