milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.53k stars 2.83k forks source link

[Bug]: Search failed with error `_InactiveRpcError of RPC that terminated with` after reinstall or upgrade #22605

Closed zhuwenxing closed 1 year ago

zhuwenxing commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:2.2.0-20230307-07357e90
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

collection name: sift_128_euclidean_HNSW

[2023-03-07T06:52:15.862Z] + python3 scripts/second_recall_test.py --host 10.101.15.107

[2023-03-07T06:52:16.809Z] 2023-03-07 06:52:16.532 | INFO     | __main__:search_test:53 - recall test for index type HNSW

[2023-03-07T06:52:18.710Z] 2023-03-07 06:52:18.234 | INFO     | __main__:search_test:63 - 

[2023-03-07T06:52:18.710Z] Search...

[2023-03-07T06:55:40.151Z] RPC error: [search], <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:

[2023-03-07T06:55:40.151Z]  status = StatusCode.DEADLINE_EXCEEDED

[2023-03-07T06:55:40.151Z]  details = "Deadline Exceeded"

[2023-03-07T06:55:40.151Z]  debug_error_string = "{"created":"@1678172138.410772622","description":"Error received from peer ipv4:10.101.15.107:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"

[2023-03-07T06:55:40.151Z] >)>, <Time:{'RPC start': '2023-03-07 06:52:18.234705', 'RPC error': '2023-03-07 06:55:38.412127'}>

[2023-03-07T06:55:40.151Z] Traceback (most recent call last):

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-03-07T06:55:40.152Z]     return func(self, *args, **kwargs)

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 518, in search

[2023-03-07T06:55:40.152Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 487, in _execute_search_requests

[2023-03-07T06:55:40.152Z]     raise pre_err

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 475, in _execute_search_requests

[2023-03-07T06:55:40.152Z]     response = self._stub.Search(request, timeout=timeout)

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 946, in __call__

[2023-03-07T06:55:40.152Z]     return _end_unary_response_blocking(state, call, False, None)

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking

[2023-03-07T06:55:40.152Z]     raise _InactiveRpcError(state)

[2023-03-07T06:55:40.152Z] grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

[2023-03-07T06:55:40.152Z]  status = StatusCode.DEADLINE_EXCEEDED

[2023-03-07T06:55:40.152Z]  details = "Deadline Exceeded"

[2023-03-07T06:55:40.152Z]  debug_error_string = "{"created":"@1678172138.410772622","description":"Error received from peer ipv4:10.101.15.107:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"

[2023-03-07T06:55:40.152Z] >

[2023-03-07T06:55:40.152Z] 

[2023-03-07T06:55:40.152Z] The above exception was the direct cause of the following exception:

[2023-03-07T06:55:40.152Z] 

[2023-03-07T06:55:40.152Z] Traceback (most recent call last):

[2023-03-07T06:55:40.152Z]   File "scripts/second_recall_test.py", line 103, in <module>

[2023-03-07T06:55:40.152Z]     search_test(host, index_type)

[2023-03-07T06:55:40.152Z]   File "scripts/second_recall_test.py", line 65, in search_test

[2023-03-07T06:55:40.152Z]     res = collection.search(

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 660, in search

[2023-03-07T06:55:40.152Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-03-07T06:55:40.152Z]     raise e

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-03-07T06:55:40.152Z]     return func(*args, **kwargs)

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-03-07T06:55:40.152Z]     ret = func(self, *args, **kwargs)

[2023-03-07T06:55:40.152Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 58, in handler

[2023-03-07T06:55:40.152Z]     raise MilvusException(message=str(e)) from e

[2023-03-07T06:55:40.152Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:

[2023-03-07T06:55:40.152Z]  status = StatusCode.DEADLINE_EXCEEDED

[2023-03-07T06:55:40.152Z]  details = "Deadline Exceeded"

[2023-03-07T06:55:40.152Z]  debug_error_string = "{"created":"@1678172138.410772622","description":"Error received from peer ipv4:10.101.15.107:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"

[2023/03/07 07:29:11.945 +00:00] [WARN] [grpcclient/client.go:297] ["ClientBase Call grpc first call get error"] [role=querynode] [error="err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:296 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:254 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:513 github.com/milvus-io/milvus/internal/proxy.(searchTask).searchShard\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_policies.go:131 github.com/milvus-io/milvus/internal/proxy.mergeRoundRobinPolicy.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [2023/03/07 07:29:11.945 +00:00] [WARN] [proxy/task_search.go:516] ["QueryNode search return error"] [traceID=fc3d844b8a9e9c2] [msgID=439923135112544257] [nodeID=15] [channels="[by-dev-rootcoord-dml_112_439922264092151365v0]"] [error="err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:296 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:254 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:513 github.com/milvus-io/milvus/internal/proxy.(searchTask).searchShard\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_policies.go:131 github.com/milvus-io/milvus/internal/proxy.mergeRoundRobinPolicy.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [2023/03/07 07:29:11.945 +00:00] [WARN] [proxy/task_policies.go:132] ["failed to do query with node"] [traceID=fc3d844b8a9e9c2] [nodeID=15] [dmlChannels="[by-dev-rootcoord-dml_112_439922264092151365v0]"] [error="code: NotReadyServe, error: err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:296 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:254 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:513 github.com/milvus-io/milvus/internal/proxy.(searchTask).searchShard\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_policies.go:131 github.com/milvus-io/milvus/internal/proxy.mergeRoundRobinPolicy.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [2023/03/07 07:29:11.945 +00:00] [WARN] [grpcclient/client.go:297] ["ClientBase Call grpc first call get error"] [role=querynode] [error="err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:296 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:254 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:513 github.com/milvus-io/milvus/internal/proxy.(searchTask).searchShard\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_policies.go:131 github.com/milvus-io/milvus/internal/proxy.mergeRoundRobinPolicy.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [2023/03/07 07:29:11.945 +00:00] [WARN] [proxy/task_search.go:516] ["QueryNode search return error"] [traceID=fc3d844b8a9e9c2] [msgID=439923135112544257] [nodeID=22] [channels="[by-dev-rootcoord-dml_113_439922264092151365v1]"] [error="err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:296 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:254 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:513 github.com/milvus-io/milvus/internal/proxy.(searchTask).searchShard\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_policies.go:131 github.com/milvus-io/milvus/internal/proxy.mergeRoundRobinPolicy.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [2023/03/07 07:29:11.945 +00:00] [WARN] [proxy/task_policies.go:132] ["failed to do query with node"] [traceID=fc3d844b8a9e9c2] [nodeID=22] [dmlChannels="[by-dev-rootcoord-dml_113_439922264092151365v1]"] [error="code: NotReadyServe, error: err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:296 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:254 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:513 github.com/milvus-io/milvus/internal/proxy.(searchTask).searchShard\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_policies.go:131 github.com/milvus-io/milvus/internal/proxy.mergeRoundRobinPolicy.func1\n/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit\n"] [2023/03/07 07:29:11.945 +00:00] [WARN] [proxy/task_policies.go:159] ["retry another query node with round robin"] [traceID=fc3d844b8a9e9c2] [Nexts="{\"by-dev-rootcoord-dml_112_439922264092151365v0\":22,\"by-dev-rootcoord-dml_113_439922264092151365v1\":16}"] [2023/03/07 07:29:11.945 +00:00] [WARN] [proxy/impl.go:2839] ["Search failed to WaitToFinish"] [traceID=fc3d844b8a9e9c2] [error="proxy TaskCondition context Done"] [role=proxy] [msgID=439923135112544257] [db=] [collection=deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] [partitions="[]"] [dsl="int64 >= 0"] [len(PlaceholderGroup)=44] [OutputFields="[int64]"] [search_params="[{\"key\":\"anns_field\",\"value\":\"binary_vector\"},{\"key\":\"topk\",\"value\":\"10\"},{\"key\":\"metric_type\",\"value\":\"HAMMING\"},{\"key\":\"params\",\"value\":\"{\\"nprobe\\":10}\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"offset\",\"value\":\"0\"}]"] [travel_timestamp=0] [guarantee_timestamp=0]

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job; https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_for_release_cron/detail/deploy_test_for_release_cron/492/pipeline

log: artifacts-pulsar-cluster-reinstall-492-server-logs.tar.gz artifacts-pulsar-cluster-reinstall-492-pytest-logs.tar.gz

Anything else?

No response

yanliang567 commented 1 year ago

/assign @jiaoew1991 /unassign

yah01 commented 1 year ago

There still exists another deadlock(s) in QueryNode /assign @congqixia

zhuwenxing commented 1 year ago

https://github.com/milvus-io/milvus/issues/22469 There is a same issue created last week, not sure whether it is caused by the same root cause

jiaoew1991 commented 1 year ago

/unassign /assign @zhuwenxing

plz verify it with #22622

aoiasd commented 1 year ago

not fix, tickler not start default only limit that watch event must complete in 30s or continuous retry because timeout, but this case not delay for watch event retry, watch event success.

zhuwenxing commented 1 year ago

It still reproduced in 2.2.0-20230308-69f4afe4 failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_for_release_cron/detail/deploy_test_for_release_cron/524/pipeline/331/

[2023-03-08T10:13:04.159Z] + python3 scripts/second_recall_test.py --host 10.101.90.144

[2023-03-08T10:13:05.081Z] 2023-03-08 10:13:05.000 | INFO     | __main__:search_test:53 - recall test for index type HNSW

[2023-03-08T10:13:05.640Z] 2023-03-08 10:13:05.560 | INFO     | __main__:search_test:63 - 

[2023-03-08T10:13:05.640Z] Search...

[2023-03-08T10:14:56.979Z] 2023-03-08 10:14:43.866 | INFO     | __main__:search_test:69 - search cost  98.3061 seconds

[2023-03-08T10:14:57.901Z] 2023-03-08 10:14:57.898 | INFO     | __main__:search_test:86 - recall=0.974

[2023-03-08T10:14:57.901Z] 2023-03-08 10:14:57.898 | INFO     | __main__:search_test:63 - 

[2023-03-08T10:14:57.901Z] Search...

[2023-03-08T10:18:19.323Z] RPC error: [search], <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:

[2023-03-08T10:18:19.323Z]  status = StatusCode.DEADLINE_EXCEEDED

[2023-03-08T10:18:19.323Z]  details = "Deadline Exceeded"

[2023-03-08T10:18:19.323Z]  debug_error_string = "{"created":"@1678270698.101074385","description":"Error received from peer ipv4:10.101.90.144:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"

[2023-03-08T10:18:19.323Z] >)>, <Time:{'RPC start': '2023-03-08 10:14:57.898721', 'RPC error': '2023-03-08 10:18:18.102119'}>

[2023-03-08T10:18:19.323Z] Traceback (most recent call last):

[2023-03-08T10:18:19.323Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-03-08T10:18:19.323Z]     return func(self, *args, **kwargs)

[2023-03-08T10:18:19.323Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 518, in search

[2023-03-08T10:18:19.323Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2023-03-08T10:18:19.323Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 487, in _execute_search_requests

[2023-03-08T10:18:19.323Z]     raise pre_err

[2023-03-08T10:18:19.324Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 475, in _execute_search_requests

[2023-03-08T10:18:19.324Z]     response = self._stub.Search(request, timeout=timeout)

[2023-03-08T10:18:19.324Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 946, in __call__

[2023-03-08T10:18:19.324Z]     return _end_unary_response_blocking(state, call, False, None)

[2023-03-08T10:18:19.324Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking

[2023-03-08T10:18:19.324Z]     raise _InactiveRpcError(state)

[2023-03-08T10:18:19.324Z] grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

[2023-03-08T10:18:19.324Z]  status = StatusCode.DEADLINE_EXCEEDED

[2023-03-08T10:18:19.324Z]  details = "Deadline Exceeded"

[2023-03-08T10:18:19.324Z]  debug_error_string = "{"created":"@1678270698.101074385","description":"Error received from peer ipv4:10.101.90.144:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"

[2023-03-08T10:18:19.324Z] >

[2023-03-08T10:18:19.324Z] 

[2023-03-08T10:18:19.324Z] The above exception was the direct cause of the following exception:

[2023-03-08T10:18:19.324Z] 

[2023-03-08T10:18:19.324Z] Traceback (most recent call last):

[2023-03-08T10:18:19.324Z]   File "scripts/second_recall_test.py", line 103, in <module>

[2023-03-08T10:18:19.324Z]     search_test(host, index_type)

[2023-03-08T10:18:19.324Z]   File "scripts/second_recall_test.py", line 65, in search_test

[2023-03-08T10:18:19.324Z]     res = collection.search(

[2023-03-08T10:18:19.324Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 660, in search

[2023-03-08T10:18:19.324Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2023-03-08T10:18:19.324Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-03-08T10:18:19.324Z]     raise e

[2023-03-08T10:18:19.324Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-03-08T10:18:19.324Z]     return func(*args, **kwargs)

[2023-03-08T10:18:19.324Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-03-08T10:18:19.324Z]     ret = func(self, *args, **kwargs)

[2023-03-08T10:18:19.324Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 58, in handler

[2023-03-08T10:18:19.324Z]     raise MilvusException(message=str(e)) from e

[2023-03-08T10:18:19.324Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:

[2023-03-08T10:18:19.324Z]  status = StatusCode.DEADLINE_EXCEEDED

[2023-03-08T10:18:19.324Z]  details = "Deadline Exceeded"

[2023-03-08T10:18:19.324Z]  debug_error_string = "{"created":"@1678270698.101074385","description":"Error received from peer ipv4:10.101.90.144:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"

[2023-03-08T10:18:19.324Z] >)>

script returned exit code 1

The first search success but the latency was high, then the second search failed.

log:

artifacts-rocksmq-standalone-reinstall-524-server-logs.tar.gz artifacts-rocksmq-standalone-reinstall-524-pytest-logs.tar.gz

yah01 commented 1 year ago

not fix, tickler not start default only limit that watch event must complete in 30s or continuous retry because timeout, but this case not delay for watch event retry, watch event success.

any coming patch?

zhuwenxing commented 1 year ago

/unassign

zhuwenxing commented 1 year ago

image tag: 2.2.0-20230309-130ab6da failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_for_release_cron/detail/deploy_test_for_release_cron/552/pipeline log: artifacts-pulsar-cluster-reinstall-552-server-logs.tar.gz artifacts-pulsar-cluster-reinstall-552-pytest-logs.tar.gz

inkinworld commented 1 year ago

same issue in this enviroment

Environment

zhuwenxing commented 1 year ago

It also reproduced in the upgrade.


[2023-03-13T03:13:20.802Z] + python3 scripts/second_recall_test.py --host 10.101.128.189

[2023-03-13T03:13:21.744Z] 2023-03-13 03:13:21.407 | INFO     | __main__:search_test:53 - recall test for index type HNSW

[2023-03-13T03:13:21.997Z] 2023-03-13 03:13:21.869 | INFO     | __main__:search_test:63 - 

[2023-03-13T03:13:21.997Z] Search...

[2023-03-13T03:16:43.368Z] RPC error: [search], <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:

[2023-03-13T03:16:43.368Z]  status = StatusCode.DEADLINE_EXCEEDED

[2023-03-13T03:16:43.368Z]  details = "Deadline Exceeded"

[2023-03-13T03:16:43.368Z]  debug_error_string = "{"created":"@1678677402.050056252","description":"Error received from peer ipv4:10.101.128.189:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"

[2023-03-13T03:16:43.368Z] >)>, <Time:{'RPC start': '2023-03-13 03:13:21.869299', 'RPC error': '2023-03-13 03:16:42.051956'}>

[2023-03-13T03:16:43.368Z] Traceback (most recent call last):

[2023-03-13T03:16:43.368Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2023-03-13T03:16:43.368Z]     return func(self, *args, **kwargs)

[2023-03-13T03:16:43.368Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 518, in search

[2023-03-13T03:16:43.368Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2023-03-13T03:16:43.368Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 487, in _execute_search_requests

[2023-03-13T03:16:43.368Z]     raise pre_err

[2023-03-13T03:16:43.368Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 475, in _execute_search_requests

[2023-03-13T03:16:43.368Z]     response = self._stub.Search(request, timeout=timeout)

[2023-03-13T03:16:43.369Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 946, in __call__

[2023-03-13T03:16:43.369Z]     return _end_unary_response_blocking(state, call, False, None)

[2023-03-13T03:16:43.369Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking

[2023-03-13T03:16:43.369Z]     raise _InactiveRpcError(state)

[2023-03-13T03:16:43.369Z] grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

[2023-03-13T03:16:43.369Z]  status = StatusCode.DEADLINE_EXCEEDED

[2023-03-13T03:16:43.369Z]  details = "Deadline Exceeded"

[2023-03-13T03:16:43.369Z]  debug_error_string = "{"created":"@1678677402.050056252","description":"Error received from peer ipv4:10.101.128.189:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"

[2023-03-13T03:16:43.369Z] >

[2023-03-13T03:16:43.369Z] 

[2023-03-13T03:16:43.369Z] The above exception was the direct cause of the following exception:

[2023-03-13T03:16:43.369Z] 

[2023-03-13T03:16:43.369Z] Traceback (most recent call last):

[2023-03-13T03:16:43.369Z]   File "scripts/second_recall_test.py", line 103, in <module>

[2023-03-13T03:16:43.369Z]     search_test(host, index_type)

[2023-03-13T03:16:43.369Z]   File "scripts/second_recall_test.py", line 65, in search_test

[2023-03-13T03:16:43.369Z]     res = collection.search(

[2023-03-13T03:16:43.369Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 660, in search

[2023-03-13T03:16:43.369Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2023-03-13T03:16:43.369Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2023-03-13T03:16:43.369Z]     raise e

[2023-03-13T03:16:43.369Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2023-03-13T03:16:43.369Z]     return func(*args, **kwargs)

[2023-03-13T03:16:43.369Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2023-03-13T03:16:43.369Z]     ret = func(self, *args, **kwargs)

[2023-03-13T03:16:43.369Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 58, in handler

[2023-03-13T03:16:43.369Z]     raise MilvusException(message=str(e)) from e

[2023-03-13T03:16:43.369Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=<_InactiveRpcError of RPC that terminated with:

[2023-03-13T03:16:43.369Z]  status = StatusCode.DEADLINE_EXCEEDED

[2023-03-13T03:16:43.369Z]  details = "Deadline Exceeded"

[2023-03-13T03:16:43.369Z]  debug_error_string = "{"created":"@1678677402.050056252","description":"Error received from peer ipv4:10.101.128.189:19530","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Deadline Exceeded","grpc_status":4}"

[2023-03-13T03:16:43.369Z] >)>

script returned exit code 1

milvus mode: cluster deploy task: upgrade old image tag: v2.2.2 new image tag: 2.2.0-20230310-b2ece6a5 failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_for_release_cron/detail/deploy_test_for_release_cron/579/pipeline/331/ log: artifacts-pulsar-cluster-upgrade-579-server-logs.tar.gz artifacts-pulsar-cluster-upgrade-579-pytest-logs.tar.gz

congqixia commented 1 year ago

image time tick lag is huge during the test

congqixia commented 1 year ago

@zhuwenxing does this issue reproduce in latest 2.2.0 branch after these deadlock issue fixed?

zhuwenxing commented 1 year ago

Not reproduced in 2.2.0-20230426-05d317fe