milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.05k stars 2.95k forks source link

[Bug]: Load timeout after reinstall and upgrade #28830

Closed zhuwenxing closed 10 months ago

zhuwenxing commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:master-20231128-881a166b
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2023-11-28T10:36:54.167Z]         # load if not loaded
[2023-11-28T10:36:54.167Z]         if replicas_loaded == 0:
[2023-11-28T10:36:54.167Z]             # create index for vector if not exist before load
[2023-11-28T10:36:54.167Z]             is_vector_indexed = False
[2023-11-28T10:36:54.167Z]             index_infos = [index.to_dict() for index in collection_w.indexes]
[2023-11-28T10:36:54.167Z]             for index_info in index_infos:
[2023-11-28T10:36:54.167Z]                 if "metric_type" in index_info.keys() or "metric_type" in index_info["index_param"]:
[2023-11-28T10:36:54.167Z]                     is_vector_indexed = True
[2023-11-28T10:36:54.167Z]                     break
[2023-11-28T10:36:54.167Z]             if is_vector_indexed is False:
[2023-11-28T10:36:54.167Z]                 default_index_param = gen_index_param(vector_index_type)
[2023-11-28T10:36:54.167Z]                 self.create_index(collection_w, default_index_field, default_index_param)
[2023-11-28T10:36:54.167Z] >           collection_w.load()
[2023-11-28T10:36:54.167Z] 
[2023-11-28T10:36:54.167Z] testcases/test_action_second_deployment.py:142: 
[2023-11-28T10:36:54.167Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2023-11-28T10:36:54.167Z] /usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py:419: in load
[2023-11-28T10:36:54.167Z]     conn.load_collection(
[2023-11-28T10:36:54.167Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:129: in handler
[2023-11-28T10:36:54.167Z]     raise e from e
[2023-11-28T10:36:54.167Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:125: in handler
[2023-11-28T10:36:54.167Z]     return func(*args, **kwargs)
[2023-11-28T10:36:54.167Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:164: in handler
[2023-11-28T10:36:54.167Z]     return func(self, *args, **kwargs)
[2023-11-28T10:36:54.167Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:104: in handler
[2023-11-28T10:36:54.167Z]     raise e from e
[2023-11-28T10:36:54.167Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:68: in handler
[2023-11-28T10:36:54.167Z]     return func(*args, **kwargs)
[2023-11-28T10:36:54.167Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:1028: in load_collection
[2023-11-28T10:36:54.167Z]     self.wait_for_loading_collection(collection_name, timeout, is_refresh=_refresh)
[2023-11-28T10:36:54.167Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:129: in handler
[2023-11-28T10:36:54.168Z]     raise e from e
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:125: in handler
[2023-11-28T10:36:54.168Z]     return func(*args, **kwargs)
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:164: in handler
[2023-11-28T10:36:54.168Z]     return func(self, *args, **kwargs)
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:104: in handler
[2023-11-28T10:36:54.168Z]     raise e from e
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:68: in handler
[2023-11-28T10:36:54.168Z]     return func(*args, **kwargs)
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:1048: in wait_for_loading_collection
[2023-11-28T10:36:54.168Z]     progress = self.get_loading_progress(
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:129: in handler
[2023-11-28T10:36:54.168Z]     raise e from e
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:125: in handler
[2023-11-28T10:36:54.168Z]     return func(*args, **kwargs)
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:164: in handler
[2023-11-28T10:36:54.168Z]     return func(self, *args, **kwargs)
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:104: in handler
[2023-11-28T10:36:54.168Z]     raise e from e
[2023-11-28T10:36:54.168Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:68: in handler
[2023-11-28T10:36:54.168Z]     return func(*args, **kwargs)
[2023-11-28T10:36:54.168Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2023-11-28T10:36:54.168Z] 
[2023-11-28T10:36:54.168Z] self = <pymilvus.client.grpc_handler.GrpcHandler object at 0x7f632856de20>
[2023-11-28T10:36:54.168Z] collection_name = 'deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_only_growing_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000'
[2023-11-28T10:36:54.168Z] partition_names = None, timeout = None, is_refresh = False
[2023-11-28T10:36:54.168Z] 
[2023-11-28T10:36:54.168Z]     @retry_on_rpc_failure()
[2023-11-28T10:36:54.168Z]     def get_loading_progress(
[2023-11-28T10:36:54.168Z]         self,
[2023-11-28T10:36:54.168Z]         collection_name: str,
[2023-11-28T10:36:54.168Z]         partition_names: Optional[List[str]] = None,
[2023-11-28T10:36:54.168Z]         timeout: Optional[float] = None,
[2023-11-28T10:36:54.168Z]         is_refresh: bool = False,
[2023-11-28T10:36:54.168Z]     ):
[2023-11-28T10:36:54.168Z]         request = Prepare.get_loading_progress(collection_name, partition_names)
[2023-11-28T10:36:54.168Z]         response = self._stub.GetLoadingProgress.future(request, timeout=timeout).result()
[2023-11-28T10:36:54.168Z]         if response.status.code != 0:
[2023-11-28T10:36:54.168Z] >           raise MilvusException(
[2023-11-28T10:36:54.168Z]                 response.status.code, response.status.reason, response.status.error_code
[2023-11-28T10:36:54.168Z]             )
[2023-11-28T10:36:54.168Z] E           pymilvus.exceptions.MilvusException: <MilvusException: (code=101, message=collection not loaded[collection=445949849405690021])>

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_cron/detail/deploy_test_cron/1652/pipeline

log: artifacts-rocksmq-standalone-upgrade-1652-pytest-logs.tar.gz

artifacts-rocksmq-standalone-upgrade-1652-server-logs.tar.gz

Anything else?

4am chaos-testing


[2023-11-28T09:49:09.312Z] + kubectl get pods -o wide

[2023-11-28T09:49:09.313Z] + grep rocksmq-standalone-upgrade-1652

[2023-11-28T09:49:09.569Z] rocksmq-standalone-upgrade-1652-etcd-0                            1/1     Running     0                2m4s    10.104.15.111   4am-node20   <none>           <none>

[2023-11-28T09:49:09.569Z] rocksmq-standalone-upgrade-1652-etcd-1                            1/1     Running     0                2m4s    10.104.16.99    4am-node21   <none>           <none>

[2023-11-28T09:49:09.569Z] rocksmq-standalone-upgrade-1652-etcd-2                            1/1     Running     0                2m4s    10.104.20.91    4am-node22   <none>           <none>

[2023-11-28T09:49:09.569Z] rocksmq-standalone-upgrade-1652-milvus-standalone-6c96dfd6gqmzn   1/1     Running     2 (111s ago)     2m4s    10.104.17.32    4am-node23   <none>           <none>

[2023-11-28T09:49:09.569Z] rocksmq-standalone-upgrade-1652-minio-55c9494bc-cw62f             1/1     Running     0                2m3s    10.104.24.106   4am-node29   <none>           <none>
yanliang567 commented 1 year ago

/assign @congqixia /unassign

congqixia commented 12 months ago

Same like #28857, could you please verify? /assign @zhuwenxing

stale[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.