milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.01k stars 2.95k forks source link

[Bug]: Query with count(*) reports error "failed to query: target not ready: collection not fully loaded" after release collection and partition load #36538

Open binbinlv opened 2 months ago

binbinlv commented 2 months ago

Is there an existing issue for this?

Environment

- Milvus version: master-latest
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.0rc81
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Query with count(*) reports error "failed to query: target not ready: collection not fully loaded" after release collection and partition load

[2024-09-26 11:37:59 - DEBUG - ci_test]: (api_request)  : [Collection.query] args: ['', ['count(*)'], None, 180], kwargs: {} (api_request.py:62)
[2024-09-26 11:37:59 - ERROR - pymilvus.decorators]: RPC error: [query], <MilvusException: (code=503, message=failed to query: target not ready: collection not fully loaded[collection=452675705376924944]: channel not available[channel=by-dev-rootcoord-dml_15_452675705376924944v0])>, <Time:{'RPC start': '2024-09-26 11:37:59.395257', 'RPC error': '2024-09-26 11:37:59.444936'}> (decorators.py:140)
[2024-09-26 11:37:59 - ERROR - ci_test]: Traceback (most recent call last):
  File "/Users/binbin/zillizProjects/milvus_latest_runnable/tests/python_client/utils/api_request.py", line 32, in inner_wrapper
    res = func(*args, **_kwargs)
  File "/Users/binbin/zillizProjects/milvus_latest_runnable/tests/python_client/utils/api_request.py", line 63, in api_request
    return func(*arg, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/orm/collection.py", line 1076, in query
    return conn.query(
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 141, in handler
    raise e from e
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 137, in handler
    return func(*args, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 176, in handler
    return func(self, *args, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 116, in handler
    raise e from e
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/decorators.py", line 86, in handler
    return func(*args, **kwargs)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/grpc_handler.py", line 1536, in query
    check_status(response.status)
  File "/Users/binbin/milvus_latest/lib/python3.8/site-packages/pymilvus/client/utils.py", line 63, in check_status
    raise MilvusException(status.code, status.reason, status.error_code)
pymilvus.exceptions.MilvusException: <MilvusException: (code=503, message=failed to query: target not ready: collection not fully loaded[collection=452675705376924944]: channel not available[channel=by-dev-rootcoord-dml_15_452675705376924944v0])>
 (api_request.py:45)
[2024-09-26 11:37:59 - ERROR - ci_test]: (api_response) : <MilvusException: (code=503, message=failed to query: target not ready: collection not fully loaded[collection=452675705376924944]: channel not available[channel=by-dev-rootcoord-dml_15_452675705376924944v0])> (api_request.py:46)

Expected Behavior

Query with count(*) returns successfully

Steps To Reproduce

    @pytest.mark.tags(CaseLabel.L1)
    @pytest.mark.repeat(3)
    def test_count_query_search_after_release_partition_load(self):
        """
        target: test query count(*) after release collection and load partition
        method: 1. create a collection and 2 partitions with nullable and default value fields
                2. insert data
                3. load one partition
                4. delete half data in each partition
                5. release the collection and load one partition
                6. search
        expected: No exception
        """
        # insert data
        collection_w = self.init_collection_general(prefix, True, 200, partition_num=1, is_index=True)[0]
        collection_w.query(expr='', output_fields=[ct.default_count_output],
                          check_task=CheckTasks.check_query_results,
                          check_items={"exp_res": [{ct.default_count_output: 200}]})
        collection_w.release()
        partition_w1, partition_w2 = collection_w.partitions
        # load
        partition_w1.load()
        # delete data
        delete_ids = [i for i in range(50, 150)]
        collection_w.delete(f"int64 in {delete_ids}")
        # release
        collection_w.release()
        # partition_w1.load()
        collection_w.load(partition_names=[partition_w1.name])
        # search on collection, partition1, partition2
        collection_w.query(expr='', output_fields=[ct.default_count_output],
                           check_task=CheckTasks.check_query_results,
                           check_items={"exp_res": [{ct.default_count_output: 50}]})
        partition_w1.query(expr='', output_fields=[ct.default_count_output],
                           check_task=CheckTasks.check_query_results,
                           check_items={"exp_res": [{ct.default_count_output: 50}]})
        vectors = [[random.random() for _ in range(ct.default_dim)] for _ in range(ct.default_nq)]
        collection_w.search(vectors[:1], ct.default_float_vec_field_name, ct.default_search_params, 200,
                            partition_names=[partition_w2.name],
                            check_task=CheckTasks.err_res,
                            check_items={ct.err_code: 1, ct.err_msg: 'not loaded'})

Milvus Log

https://grafana-4am.zilliz.cc/explore?orgId=1&panes=%7B%22BZB%22:%7B%22datasource%22:%22vhI6Vw67k%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D%5C%22devops%5C%22,namespace%3D%5C%22chaos-testing%5C%22,pod%3D~%5C%22default-null-test-tszos.%2A%5C%22%7D%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22vhI6Vw67k%22%7D%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1

Anything else?

No response

binbinlv commented 2 months ago

This is an occasional issue, not 100% reproduced.

yanliang567 commented 2 months ago

/assign @congqixia /unassign

yanliang567 commented 3 weeks ago

@binbinlv @congqixia any workarounds if it happened.