milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.03k stars 2.95k forks source link

[Bug]: Search failed with error `reason=target node id not match target id = 3, node id = 12` after pulsar pod kill chaos test #21027

Closed zhuwenxing closed 1 year ago

zhuwenxing commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: master-20221206-f8cff798
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.3.0.dev15
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2022-12-06T22:16:38.598Z] [2022-12-06 22:16:29 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 1, 120], kwargs: {} (api_request.py:56)

[2022-12-06T22:16:38.598Z] [2022-12-06 22:16:29 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2022-12-06T22:16:38.598Z] [2022-12-06 22:16:29 - INFO - ci_test]: [test][2022-12-06T22:16:29Z] [0.00333510s] DeleteChecker__huJONjuF load -> None (wrapper.py:30)

[2022-12-06T22:16:38.598Z] [2022-12-06 22:16:29 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.01373642304506658, 0.09407661457501135, 0.037831905386391126, 0.028200136389675192, 0.1333814968391419, 0.11025818933976621, 0.10980963426000147, 0.13031918810532903, 0.03308945619420152, 0.11283760831918727, 0.023766451770019223, 0.019642186799281227, 0.12117970130462996, 0.06948829826502975, ......, kwargs: {} (api_request.py:56)

[2022-12-06T22:16:38.598Z] [2022-12-06 22:16:29 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=3, reason=target node id not match target id = 3, node id = 12)>, <Time:{'RPC start': '2022-12-06 22:16:29.318095', 'RPC error': '2022-12-06 22:16:29.439993'}> (decorators.py:108)

[2022-12-06T22:16:38.598Z] [2022-12-06 22:16:29 - ERROR - ci_test]: Traceback (most recent call last):

[2022-12-06T22:16:38.598Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper

[2022-12-06T22:16:38.598Z]     res = func(*args, **_kwargs)

[2022-12-06T22:16:38.598Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request

[2022-12-06T22:16:38.598Z]     return func(*arg, **kwargs)

[2022-12-06T22:16:38.598Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 610, in search

[2022-12-06T22:16:38.598Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2022-12-06T22:16:38.598Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2022-12-06T22:16:38.598Z]     raise e

[2022-12-06T22:16:38.598Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2022-12-06T22:16:38.598Z]     return func(*args, **kwargs)

[2022-12-06T22:16:38.598Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2022-12-06T22:16:38.598Z]     ret = func(self, *args, **kwargs)

[2022-12-06T22:16:38.598Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2022-12-06T22:16:38.598Z]     raise e

[2022-12-06T22:16:38.598Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2022-12-06T22:16:38.598Z]     return func(self, *args, **kwargs)

[2022-12-06T22:16:38.598Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 469, in search

[2022-12-06T22:16:38.598Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2022-12-06T22:16:38.598Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 438, in _execute_search_requests

[2022-12-06T22:16:38.598Z]     raise pre_err

[2022-12-06T22:16:38.598Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 429, in _execute_search_requests

[2022-12-06T22:16:38.598Z]     raise MilvusException(response.status.error_code, response.status.reason)

[2022-12-06T22:16:38.599Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=3, reason=target node id not match target id = 3, node id = 12)>

[2022-12-06T22:16:38.599Z]  (api_request.py:39)

[2022-12-06T22:16:38.599Z] [2022-12-06 22:16:29 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=3, reason=target node id not match target id = 3, node id = 12)> (api_request.py:40)

Expected Behavior

all test cases passed

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/369/pipeline log: artifacts-pulsar-pod-kill-369-server-logs.tar.gz artifacts-pulsar-pod-kill-369-pytest-logs.tar.gz

Anything else?

No response

zhuwenxing commented 1 year ago

may related to https://github.com/milvus-io/milvus/issues/20970

yah01 commented 1 year ago

Failed to unsubscribe, with pulsar connection closed error. We need a strategy to take the case of failed to re-connect to pulsar @congqixia

yah01 commented 1 year ago

/assign

yah01 commented 1 year ago

https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test/detail/chaos-test/3286/pipeline/ Pulsar recovery takes over 1min, I set the idle time to 3min, and chaos test passed @zhuwenxing

yah01 commented 1 year ago

Also, Milvus's etcd lease timeout is 1min, idle time should be greater than this to make sure the system has recovered after chaos /cc @xiaofan-luan /cc @congqixia

yanliang567 commented 1 year ago

so we need a fit idle timeout for milvus? any suggestion? @yah01

yah01 commented 1 year ago

so we need a fit idle timeout for milvus? any suggestion? @yah01

2 minutes should be enough to recover

zhuwenxing commented 1 year ago

I have changed the idle time to 3 min

yanliang567 commented 1 year ago

/assign @zhuwenxing please help to verify it as the idle time updated.

/unassign

zhuwenxing commented 1 year ago

it is still reproduced, but the error message has been changed.

[2022-12-12T04:59:17.064Z] [2022-12-12 04:59:16 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.10011674241569275, 0.026484448898056547, 0.1174656922074973, 0.14838325856782653, 0.037432259910172905, 0.0819583382143737, 0.044623756846596356, 0.08764375063613665, 0.033402024552142126, 0.07062440934806637, 0.05836438474772751, 0.06159803316382118, 0.11255150425240625, 0.10276470323752888, 0......, kwargs: {} (api_request.py:56)

[2022-12-12T04:59:17.064Z] [2022-12-12 04:59:16 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=3, reason=QueryNode 12 can't serve, recovering: target node id not match target id = 3, node id = 12)>, <Time:{'RPC start': '2022-12-12 04:59:16.418005', 'RPC error': '2022-12-12 04:59:16.783941'}> (decorators.py:108)

[2022-12-12T04:59:17.064Z] [2022-12-12 04:59:16 - ERROR - ci_test]: Traceback (most recent call last):

[2022-12-12T04:59:17.065Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper

[2022-12-12T04:59:17.065Z]     res = func(*args, **_kwargs)

[2022-12-12T04:59:17.065Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request

[2022-12-12T04:59:17.065Z]     return func(*arg, **kwargs)

[2022-12-12T04:59:17.065Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 610, in search

[2022-12-12T04:59:17.065Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2022-12-12T04:59:17.065Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2022-12-12T04:59:17.065Z]     raise e

[2022-12-12T04:59:17.065Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2022-12-12T04:59:17.065Z]     return func(*args, **kwargs)

[2022-12-12T04:59:17.065Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2022-12-12T04:59:17.065Z]     ret = func(self, *args, **kwargs)

[2022-12-12T04:59:17.065Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2022-12-12T04:59:17.065Z]     raise e

[2022-12-12T04:59:17.065Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2022-12-12T04:59:17.065Z]     return func(self, *args, **kwargs)

[2022-12-12T04:59:17.065Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 469, in search

[2022-12-12T04:59:17.065Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2022-12-12T04:59:17.065Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 438, in _execute_search_requests

[2022-12-12T04:59:17.065Z]     raise pre_err

[2022-12-12T04:59:17.065Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 429, in _execute_search_requests

[2022-12-12T04:59:17.065Z]     raise MilvusException(response.status.error_code, response.status.reason)

[2022-12-12T04:59:17.065Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=3, reason=QueryNode 12 can't serve, recovering: target node id not match target id = 3, node id = 12)>

[2022-12-12T04:59:17.065Z]  (api_request.py:39)

[2022-12-12T04:59:17.065Z] [2022-12-12 04:59:16 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=3, reason=QueryNode 12 can't serve, recovering: target node id not match target id = 3, node id = 12)> (api_request.py:40)

chaos type: pod-kill image tag: master-20221212-e977e014 target pod: pulsar failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/467/pipeline log: artifacts-pulsar-pod-kill-467-server-logs.tar.gz artifacts-pulsar-pod-kill-467-pytest-logs.tar.gz

zhuwenxing commented 1 year ago

/unassign /assign @yah01

zhuwenxing commented 1 year ago

It is also reproduced in 2.2 branch chaos type: pod-failure image tag: 2.2.0-20221212-184d3c35 target pod: querynode failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release-cron/detail/chaos-test-for-release-cron/405/pipeline

log: artifacts-querynode-pod-failure-405-server-logs.tar.gz artifacts-querynode-pod-failure-405-pytest-logs.tar.gz

zhuwenxing commented 1 year ago

chaos type: pod-kill image tag: 2.2.0-20221212-184d3c35 target pod: etcd failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release-cron/detail/chaos-test-for-release-cron/395/pipeline log: artifacts-etcd-pod-kill-395-server-logs.tar.gz artifacts-etcd-pod-kill-395-pytest-logs.tar.gz

yanliang567 commented 1 year ago

@zhuwenxing does this still reproduce?

zhuwenxing commented 1 year ago

does this still reproduce?

It is not reproduced in the master branch but is still reproduced in the 2.2 branch. @yah01 @congqixia Any cherry-pick PR in 2.2?

zhuwenxing commented 1 year ago

[2022-12-19T03:21:16.450Z] [2022-12-19 03:21:15 - INFO - ci_test]: index info: [{'collection': 'Hello_Milvus', 'field': 'varchar', 'index_name': 'test_SKUVzBYh', 'index_param': {'index_type': 'Trie'}}, {'collection': 'Hello_Milvus', 'field': 'float_vector', 'index_name': 'test_dOcvvIOR', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}] (test_data_persistence.py:64)

[2022-12-19T03:21:16.450Z] [2022-12-19 03:21:15 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 1, 120], kwargs: {} (api_request.py:56)

[2022-12-19T03:21:16.450Z] [2022-12-19 03:21:15 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2022-12-19T03:21:16.450Z] [2022-12-19 03:21:15 - INFO - ci_test]: [test][2022-12-19T03:21:15Z] [0.00528159s] Hello_Milvus load -> None (wrapper.py:30)

[2022-12-19T03:21:16.450Z] [2022-12-19 03:21:15 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.12213514804040805, 0.1380733812164912, 0.09017636293140217, 0.1075027498853586, 0.028142317306075682, 0.01582499737221483, 0.12412341708114152, 0.03861188170183319, 0.036823224957653694, 0.06920357325807211, 0.14432174786304863, 0.0016837899327997307, 0.11182058393475187, 0.1409779737635982, 0......., kwargs: {} (api_request.py:56)

[2022-12-19T03:21:16.450Z] [2022-12-19 03:21:16 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=7, reason=QueryNode 12 can't serve, recovering: target node id not match target id = 7, node id = 12)>, <Time:{'RPC start': '2022-12-19 03:21:15.734553', 'RPC error': '2022-12-19 03:21:16.129307'}> (decorators.py:108)

[2022-12-19T03:21:16.450Z] [2022-12-19 03:21:16 - ERROR - ci_test]: Traceback (most recent call last):

[2022-12-19T03:21:16.450Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper

[2022-12-19T03:21:16.450Z]     res = func(*args, **_kwargs)

[2022-12-19T03:21:16.450Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request

[2022-12-19T03:21:16.450Z]     return func(*arg, **kwargs)

[2022-12-19T03:21:16.451Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 609, in search

[2022-12-19T03:21:16.451Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2022-12-19T03:21:16.451Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2022-12-19T03:21:16.451Z]     raise e

[2022-12-19T03:21:16.451Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2022-12-19T03:21:16.451Z]     return func(*args, **kwargs)

[2022-12-19T03:21:16.451Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2022-12-19T03:21:16.451Z]     ret = func(self, *args, **kwargs)

[2022-12-19T03:21:16.451Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2022-12-19T03:21:16.451Z]     raise e

[2022-12-19T03:21:16.451Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2022-12-19T03:21:16.451Z]     return func(self, *args, **kwargs)

[2022-12-19T03:21:16.451Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 470, in search

[2022-12-19T03:21:16.451Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2022-12-19T03:21:16.451Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 439, in _execute_search_requests

[2022-12-19T03:21:16.451Z]     raise pre_err

[2022-12-19T03:21:16.451Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 430, in _execute_search_requests

[2022-12-19T03:21:16.451Z]     raise MilvusException(response.status.error_code, response.status.reason)

[2022-12-19T03:21:16.451Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=7, reason=QueryNode 12 can't serve, recovering: target node id not match target id = 7, node id = 12)>

[2022-12-19T03:21:16.451Z]  (api_request.py:39)

[2022-12-19T03:21:16.451Z] [2022-12-19 03:21:16 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=7, reason=QueryNode 12 can't serve, recovering: target node id not match target id = 7, node id = 12)> (api_request.py:40)

[2022-12-19T03:21:16.451Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------

[2022-12-19T03:21:16.451Z] =========================== short test summary info ============================

[2022-12-19T03:21:16.451Z] FAILED testcases/test_data_persistence.py::TestDataPersistence::test_milvus_default - AssertionError

[2022-12-19T03:21:16.451Z] ============================== 1 failed in 4.26s ===============================

chaos type: pod-failure image tag: 2.2.0-20221216-1aa7a9a8 target pod: querynode failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-for-release-cron/detail/chaos-test-for-release-cron/550/pipeline log: artifacts-querynode-pod-failure-550-server-logs.tar.gz artifacts-querynode-pod-failure-550-pytest-logs.tar.gz

yah01 commented 1 year ago

/assign @zhuwenxing 2.2 fix merged

zhuwenxing commented 1 year ago

[2022-12-22T21:25:06.617Z] [2022-12-22 21:22:34 - DEBUG - ci_test]: (api_request)  : [Collection.load] args: [None, 1, 120], kwargs: {} (api_request.py:56)

[2022-12-22T21:25:06.617Z] [2022-12-22 21:22:34 - DEBUG - ci_test]: (api_response) : None  (api_request.py:31)

[2022-12-22T21:25:06.617Z] [2022-12-22 21:22:34 - INFO - ci_test]: [test][2022-12-22T21:22:34Z] [0.00430286s] Checker__MAieYr3G load -> None (wrapper.py:30)

[2022-12-22T21:25:06.617Z] [2022-12-22 21:22:34 - DEBUG - ci_test]: (api_request)  : [Collection.search] args: [[[0.1034426524979644, 0.052573964560210795, 0.09321532694772285, 0.13742976129167736, 0.0874793833159194, 0.14936034785769273, 0.11336207724346294, 0.0998346228709992, 0.10479114113346255, 0.11088699606972784, 0.13533603435245778, 0.0014681036676781197, 0.0717271918451279, 0.04021748196965942, 0.03......, kwargs: {} (api_request.py:56)

[2022-12-22T21:25:06.617Z] [2022-12-22 21:22:35 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=2, reason=QueryNode 0 can't serve, recovering: target node id not match target id = 2, node id = 0)>, <Time:{'RPC start': '2022-12-22 21:22:34.321138', 'RPC error': '2022-12-22 21:22:35.130073'}> (decorators.py:108)

[2022-12-22T21:25:06.617Z] [2022-12-22 21:22:35 - ERROR - ci_test]: Traceback (most recent call last):

[2022-12-22T21:25:06.617Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper

[2022-12-22T21:25:06.617Z]     res = func(*args, **_kwargs)

[2022-12-22T21:25:06.617Z]   File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request

[2022-12-22T21:25:06.617Z]     return func(*arg, **kwargs)

[2022-12-22T21:25:06.617Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 609, in search

[2022-12-22T21:25:06.617Z]     res = conn.search(self._name, data, anns_field, param, limit, expr,

[2022-12-22T21:25:06.617Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2022-12-22T21:25:06.617Z]     raise e

[2022-12-22T21:25:06.617Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2022-12-22T21:25:06.617Z]     return func(*args, **kwargs)

[2022-12-22T21:25:06.617Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2022-12-22T21:25:06.617Z]     ret = func(self, *args, **kwargs)

[2022-12-22T21:25:06.617Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler

[2022-12-22T21:25:06.617Z]     raise e

[2022-12-22T21:25:06.617Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2022-12-22T21:25:06.617Z]     return func(self, *args, **kwargs)

[2022-12-22T21:25:06.617Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 470, in search

[2022-12-22T21:25:06.617Z]     return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)

[2022-12-22T21:25:06.617Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 439, in _execute_search_requests

[2022-12-22T21:25:06.617Z]     raise pre_err

[2022-12-22T21:25:06.617Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 430, in _execute_search_requests

[2022-12-22T21:25:06.617Z]     raise MilvusException(response.status.error_code, response.status.reason)

[2022-12-22T21:25:06.617Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=2, reason=QueryNode 0 can't serve, recovering: target node id not match target id = 2, node id = 0)>

[2022-12-22T21:25:06.617Z]  (api_request.py:39)

[2022-12-22T21:25:06.617Z] [2022-12-22 21:22:35 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=2, reason=QueryNode 0 can't serve, recovering: target node id not match target id = 2, node id = 0)> (api_request.py:40)

chaos type: pod-failure image tag: master-20221222-98088e3b target pod: pulsar failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/734/pipeline log: artifacts-pulsar-pod-failure-734-server-logs.tar.gz artifacts-pulsar-pod-failure-734-pytest-logs.tar.gz

zhuwenxing commented 1 year ago

@yah01 Please take a look. It was still reproduced in master

yah01 commented 1 year ago

@yah01 Please take a look. It was still reproduced in master

The QueryNode reports that the subscription is re-subscribed, and then panic, maybe upgrade the pulsar SDK will help https://github.com/milvus-io/milvus/pull/21456

zhuwenxing commented 1 year ago

Verified and passed with 2.2.0-20230202-161725a6