Closed wangting0128 closed 1 year ago
same as #24904
/unassign
client argo task: fouramf-lb-op-rolling-upgrade rollingupgrade argo task: fouramf-8xx7s
image: master-20230628-31122a68 -> master-20230629-b30517d3
server:
[2023-06-30 03:59:31,849 - INFO - fouram]: [get_pods] pod details of release(lb-op-rolling-upgrade): (operator_client.py:301)
[2023-06-30 03:59:31,849 - INFO - fouram]:
NAME STATUS RESTARTS AGE IP NODE
lb-op-rolling-upgrade-milvus-datacoord-79765db9cb-wbndw Running 0 45m 10.104.24.124 4am-node29
lb-op-rolling-upgrade-milvus-datanode-7778d54d85-p7mfd Running 0 45m 10.104.20.84 4am-node22
lb-op-rolling-upgrade-milvus-indexcoord-6b65cb6cd8-wnfxp Running 0 45m 10.104.20.83 4am-node22
lb-op-rolling-upgrade-milvus-indexnode-57f7ffd6b5-7n6gl Running 0 45m 10.104.15.33 4am-node20
lb-op-rolling-upgrade-milvus-proxy-66cc5ff56c-7fq6f Running 0 45m 10.104.15.31 4am-node20
lb-op-rolling-upgrade-milvus-querycoord-7d87d8f5d8-2tddv Running 0 45m 10.104.24.123 4am-node29
lb-op-rolling-upgrade-milvus-querynode-9ff6c45c7-2k4cf Running 0 45m 10.104.24.125 4am-node29
lb-op-rolling-upgrade-milvus-querynode-9ff6c45c7-6jbt2 Running 0 45m 10.104.18.206 4am-node25
lb-op-rolling-upgrade-milvus-querynode-9ff6c45c7-lpw5c Running 0 45m 10.104.15.34 4am-node20
lb-op-rolling-upgrade-milvus-rootcoord-5478f587c6-bb7pp Running 0 45m 10.104.20.90 4am-node22
lb-op-rolling-upgrade-etcd-0 Running 0 49m 10.104.15.27 4am-node20
lb-op-rolling-upgrade-etcd-1 Running 0 49m 10.104.20.74 4am-node22
lb-op-rolling-upgrade-etcd-2 Running 0 49m 10.104.6.49 4am-node13
lb-op-rolling-upgrade-pulsar-bookie-0 Running 0 49m 10.104.6.51 4am-node13
lb-op-rolling-upgrade-pulsar-bookie-1 Running 0 49m 10.104.20.79 4am-node22
lb-op-rolling-upgrade-pulsar-bookie-2 Running 0 49m 10.104.24.118 4am-node29
lb-op-rolling-upgrade-pulsar-broker-0 Running 0 49m 10.104.15.15 4am-node20
lb-op-rolling-upgrade-pulsar-proxy-0 Running 0 49m 10.104.15.17 4am-node20
lb-op-rolling-upgrade-pulsar-recovery-0 Running 0 49m 10.104.20.70 4am-node22
lb-op-rolling-upgrade-pulsar-zookeeper-0 Running 0 49m 10.104.20.77 4am-node22
lb-op-rolling-upgrade-pulsar-zookeeper-1 Running 0 48m 10.104.15.30 4am-node20
lb-op-rolling-upgrade-pulsar-zookeeper-2 Running 0 48m 10.104.18.200 4am-node25
lb-op-rolling-upgrade-minio-0 Running 0 49m 10.104.15.24 4am-node20
lb-op-rolling-upgrade-minio-1 Running 0 49m 10.104.20.71 4am-node22
lb-op-rolling-upgrade-minio-2 Running 0 49m 10.104.24.115 4am-node29
lb-op-rolling-upgrade-minio-3 Running 0 49m 10.104.6.45 4am-node13
(common_func.py:407)
[2023-06-30 03:59:31,849 - INFO - fouram]: [Base] upgrade configs: {'spec': {'components': {'enableRollingUpdate': True, 'imageUpdateMode': 'rollingUpgrade', 'image': 'harbor.milvus.io/milvus/milvus:master-20230629-b30517d3'}, 'mode': 'cluster'}, 'apiVersion': 'milvus.io/v1beta1', 'kind': 'Milvus'} (base.py:194)
[2023-06-30 03:59:32,082 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:00:02,237 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:00:32,411 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:01:02,671 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:01:32,816 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:02:02,990 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:02:33,149 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:03:03,413 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:03:33,553 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:04:03,718 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:04:33,896 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:05:04,067 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:05:34,320 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:06:04,479 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:06:34,658 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:07:04,815 - INFO - fouram]: [wait_for_healthy] Waiting for instance:lb-op-rolling-upgrade health... (operator_client.py:188)
[2023-06-30 04:07:35,068 - INFO - fouram]: [wait_for_healthy] Instance:lb-op-rolling-upgrade is healthy. (operator_client.py:185)
[2023-06-30 04:07:35,226 - INFO - fouram]: [Base] Get pods after upgrade... (base.py:202)
[2023-06-30 04:07:35,867 - INFO - fouram]: [get_pods] pod details of release(lb-op-rolling-upgrade): (operator_client.py:301)
[2023-06-30 04:07:35,867 - INFO - fouram]:
NAME STATUS RESTARTS AGE IP NODE
lb-op-rolling-upgrade-milvus-datacoord-86d9ddb668-hqq77 Running 0 7m 10.104.18.207 4am-node25
lb-op-rolling-upgrade-milvus-datanode-5d6db89745-2zffp Running 0 3m 10.104.15.38 4am-node20
lb-op-rolling-upgrade-milvus-indexcoord-695b9699b4-dpvt6 Running 0 5m 10.104.15.36 4am-node20
lb-op-rolling-upgrade-milvus-indexnode-56658bb4c7-fhghj Running 0 3m 10.104.24.151 4am-node29
lb-op-rolling-upgrade-milvus-proxy-6554c99887-xl7ks Running 0 1m 10.104.18.209 4am-node25
lb-op-rolling-upgrade-milvus-querycoord-7bddb5fcb-2lj5c Running 0 4m 10.104.15.37 4am-node20
lb-op-rolling-upgrade-milvus-querynode-846989fb8f-5cvzs Running 0 3m 10.104.15.41 4am-node20
lb-op-rolling-upgrade-milvus-querynode-846989fb8f-g5pwf Running 0 3m 10.104.24.152 4am-node29
lb-op-rolling-upgrade-milvus-querynode-846989fb8f-n84rs Running 0 2m 10.104.20.121 4am-node22
lb-op-rolling-upgrade-milvus-rootcoord-5c6b64695b-v5k2j Running 0 8m 10.104.20.117 4am-node22
lb-op-rolling-upgrade-etcd-0 Running 0 57m 10.104.15.27 4am-node20
lb-op-rolling-upgrade-etcd-1 Running 0 57m 10.104.20.74 4am-node22
lb-op-rolling-upgrade-etcd-2 Running 0 57m 10.104.6.49 4am-node13
lb-op-rolling-upgrade-pulsar-bookie-0 Running 0 57m 10.104.6.51 4am-node13
lb-op-rolling-upgrade-pulsar-bookie-1 Running 0 57m 10.104.20.79 4am-node22
lb-op-rolling-upgrade-pulsar-bookie-2 Running 0 57m 10.104.24.118 4am-node29
lb-op-rolling-upgrade-pulsar-broker-0 Running 0 57m 10.104.15.15 4am-node20
lb-op-rolling-upgrade-pulsar-proxy-0 Running 0 57m 10.104.15.17 4am-node20
lb-op-rolling-upgrade-pulsar-recovery-0 Running 0 57m 10.104.20.70 4am-node22
lb-op-rolling-upgrade-pulsar-zookeeper-0 Running 0 57m 10.104.20.77 4am-node22
lb-op-rolling-upgrade-pulsar-zookeeper-1 Running 0 56m 10.104.15.30 4am-node20
lb-op-rolling-upgrade-pulsar-zookeeper-2 Running 0 56m 10.104.18.200 4am-node25
lb-op-rolling-upgrade-minio-0 Running 0 57m 10.104.15.24 4am-node20
lb-op-rolling-upgrade-minio-1 Running 0 57m 10.104.20.71 4am-node22
lb-op-rolling-upgrade-minio-2 Running 0 57m 10.104.24.115 4am-node29
lb-op-rolling-upgrade-minio-3 Running 0 57m 10.104.6.45 4am-node13
(common_func.py:407)
client test result:
{'server': {'deploy_tool': 'operator',
'deploy_mode': 'cluster',
'config_name': 'cluster_8c16m',
'config': {'spec': {'components': {'queryNode': {'resources': {'limits': {'cpu': '8',
'memory': '8Gi'},
'requests': {'cpu': '4',
'memory': '4Gi'}},
'replicas': 3},
'indexNode': {'resources': {'limits': {'cpu': '4.0',
'memory': '8Gi'},
'requests': {'cpu': '3.0',
'memory': '5Gi'}},
'replicas': 1},
'dataNode': {'resources': {'limits': {'cpu': '2.0',
'memory': '2Gi'},
'requests': {'cpu': '2.0',
'memory': '2Gi'}}},
'image': 'harbor.milvus.io/milvus/milvus:master-20230628-31122a68'},
'mode': 'cluster',
'dependencies': {'etcd': {'inCluster': {'deletionPolicy': 'Delete',
'pvcDeletion': True,
'values': {'global': {'storageClass': 'local-path'},
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}}}},
'pulsar': {'inCluster': {'deletionPolicy': 'Delete',
'pvcDeletion': True,
'values': {'bookkeeper': {'volumes': {'journal': {'storageClassName': 'local-path'},
'ledgers': {'storageClassName': 'local-path'}}},
'zookeeper': {'volumes': {'data': {'storageClassName': 'local-path'}}}}}},
'kafka': {'inCluster': {'deletionPolicy': 'Delete',
'pvcDeletion': True,
'values': {'persistence': {'storageClass': 'local-path'}}}},
'storage': {'inCluster': {'deletionPolicy': 'Delete',
'pvcDeletion': True,
'values': {'persistence': {'storageClass': 'local-path'},
'metrics': {'podMonitor': {'enabled': True}}}}}},
'config': {'log': {'level': 'debug'}}},
'apiVersion': 'milvus.io/v1beta1',
'kind': 'Milvus',
'metadata': {'name': 'fouram-op-16-5610'}},
'host': 'lb-op-rolling-upgrade-milvus.qa-milvus',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_custom_parameters',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'dataset_name': 'sift',
'dataset_size': '5m',
'ni_per': 50000},
'load_params': {'replica_number': 3},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 100,
'during_time': '5h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'search',
'weight': 1,
'params': {'nq': 1000,
'top_k': 10,
'search_param': {'ef': 64},
'timeout': 3600,
'random_data': True}}]},
'run_id': 2023063045896080,
'datetime': '2023-06-30 03:09:49.854644',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 909.0306},
'insert': {'total_time': 138.2387,
'VPS': 36169.3216,
'batch_time': 1.3824,
'batch': 50000},
'flush': {'RT': 2.5226},
'load': {'RT': 6.5599},
'Locust': {'Aggregated': {'Requests': 141500,
'Fails': 369,
'RPS': 7.86,
'fail_s': 0.0,
'RT_max': 19459.52,
'RT_avg': 12128.43,
'TP50': 12000.0,
'TP99': 15000.0},
'search': {'Requests': 141500,
'Fails': 369,
'RPS': 7.86,
'fail_s': 0.0,
'RT_max': 19459.52,
'RT_avg': 12128.43,
'TP50': 12000.0,
'TP99': 15000.0}}}}} (performance_template.py:141)
[2023-06-30 04:00:15,406 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2023-06-30 04:00:15,407 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2023-06-30 04:00:15,407 - INFO - fouram]: grpc search 12742 0(0.00%) | 11900 1221 14330 12000 | 8.40 0.00 (stats.py:789)
[2023-06-30 04:00:15,407 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2023-06-30 04:00:15,407 - INFO - fouram]: Aggregated 12742 0(0.00%) | 11900 1221 14330 12000 | 8.40 0.00 (stats.py:789)
[2023-06-30 04:00:15,407 - INFO - fouram]: (stats.py:790)
[2023-06-30 04:00:15,407 - INFO - fouram]: Response time percentiles (approximated) (stats.py:819)
[2023-06-30 04:00:15,407 - INFO - fouram]: Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100% # reqs (stats.py:819)
[2023-06-30 04:00:15,407 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2023-06-30 04:00:15,407 - INFO - fouram]: grpc search 12000 12000 12000 12000 12000 13000 13000 13000 14000 14000 14000 12742 (stats.py:819)
[2023-06-30 04:00:15,407 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2023-06-30 04:00:15,407 - INFO - fouram]: Aggregated 12000 12000 12000 12000 12000 13000 13000 13000 14000 14000 14000 12742 (stats.py:819)
[2023-06-30 04:00:15,407 - INFO - fouram]: (stats.py:820)
[2023-06-30 04:00:23,201 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=syncTimestamp Failed:context deadline exceeded)>, <Time:{'RPC start': '2023-06-30 04:00:10.232508', 'RPC error': '2023-06-30 04:00:23.200973'}> (decorators.py:108)
[2023-06-30 04:00:23,202 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=syncTimestamp Failed:context deadline exceeded)> (api_request.py:53)
[2023-06-30 04:00:23,202 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=syncTimestamp Failed:context deadline exceeded)> (func_check.py:46)
[2023-06-30 04:00:23,202 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=syncTimestamp Failed:context deadline exceeded)>, <Time:{'RPC start': '2023-06-30 04:00:10.339087', 'RPC error': '2023-06-30 04:00:23.202934'}> (decorators.py:108)
[2023-06-30 04:00:23,203 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=syncTimestamp Failed:context deadline exceeded)> (api_request.py:53)
[2023-06-30 04:00:23,203 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=syncTimestamp Failed:context deadline exceeded)> (func_check.py:46)
[2023-06-30 04:00:33,325 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-06-30 04:00:18.280082', 'RPC error': '2023-06-30 04:00:33.325800'}> (decorators.py:108)
[2023-06-30 04:00:33,326 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (api_request.py:53)
[2023-06-30 04:00:33,326 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (func_check.py:46)
[2023-06-30 04:00:33,326 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-06-30 04:00:18.386501', 'RPC error': '2023-06-30 04:00:33.326694'}> (decorators.py:108)
[2023-06-30 04:00:33,326 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (api_request.py:53)
[2023-06-30 04:00:33,327 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (func_check.py:46)
[2023-06-30 04:00:33,327 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-06-30 04:00:18.492884', 'RPC error': '2023-06-30 04:00:33.327182'}> (decorators.py:108)
[2023-06-30 04:00:33,327 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (api_request.py:53)
[2023-06-30 04:00:33,327 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (func_check.py:46)
[2023-06-30 04:00:33,327 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-06-30 04:00:18.615418', 'RPC error': '2023-06-30 04:00:33.327668'}> (decorators.py:108)
[2023-06-30 04:00:33,327 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (api_request.py:53)
[2023-06-30 04:00:33,327 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (func_check.py:46)
[2023-06-30 04:00:33,328 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-06-30 04:00:18.720946', 'RPC error': '2023-06-30 04:00:33.328131'}> (decorators.py:108)
[2023-06-30 04:00:33,328 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (api_request.py:53)
[2023-06-30 04:01:23,818 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=attempt #0: err: find no available querycoord, check querycoord state
, /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:352 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:121 github.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]
/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:317 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetShardLeaders
/go/src/github.com/milvus-io/milvus/internal/proxy/meta_cache.go:734 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).GetShards.func1
/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:40 github.com/milvus-io/milvus/pkg/util/retry.Do
/go/src/github.com/milvus-io/milvus/internal/proxy/meta_cache.go:733 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).GetShards
/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:187 github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute
/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:406 github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute
/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:457 github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask
: unrecoverable error: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-06-30 04:01:05.902341', 'RPC error': '2023-06-30 04:01:23.818730'}> (decorators.py:108)
[2023-06-30 04:01:23,818 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=attempt #0: err: find no available querycoord, check querycoord state
, /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:352 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:121 github.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]
/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:317 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetShardLeaders
/go/src/github.com/milvus-io/milvus/internal/proxy/meta_cache.go:734 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).GetShards.func1
/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:40 github.com/milvus-io/milvus/pkg/util/retry.Do
/go/src/github.com/milvus-io/milvus/internal/proxy/meta_cache.go:733 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).GetShards
/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:187 github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute
/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:406 github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute
/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:457 github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask
: unrecoverable error: fail to search on all shard leaders)> (api_request.py:53)
[2023-06-30 04:01:23,819 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=attempt #0: err: find no available querycoord, check querycoord state
, /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:352 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:121 github.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]
/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:317 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(*Client).GetShardLeaders
/go/src/github.com/milvus-io/milvus/internal/proxy/meta_cache.go:734 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).GetShards.func1
/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:40 github.com/milvus-io/milvus/pkg/util/retry.Do
/go/src/github.com/milvus-io/milvus/internal/proxy/meta_cache.go:733 github.com/milvus-io/milvus/internal/proxy.(*MetaCache).GetShards
/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:187 github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute
/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:406 github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute
/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:457 github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask
: unrecoverable error: fail to search on all shard leaders)> (func_check.py:46)
[2023-06-30 04:01:37,349 - INFO - fouram]: Type Name # reqs # fails | Avg Min Max Med | req/s failures/s (stats.py:789)
[2023-06-30 04:01:37,349 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2023-06-30 04:01:37,349 - INFO - fouram]: grpc search 13348 369(2.76%) | 11948 1221 19459 12000 | 8.00 0.00 (stats.py:789)
[2023-06-30 04:01:37,349 - INFO - fouram]: --------|----------------------------------------------------------------------------|-------|-------------|-------|-------|-------|-------|--------|----------- (stats.py:789)
[2023-06-30 04:01:37,349 - INFO - fouram]: Aggregated 13348 369(2.76%) | 11948 1221 19459 12000 | 8.00 0.00 (stats.py:789)
[2023-06-30 04:01:37,349 - INFO - fouram]: (stats.py:790)
[2023-06-30 04:01:37,349 - INFO - fouram]: Response time percentiles (approximated) (stats.py:819)
[2023-06-30 04:01:37,349 - INFO - fouram]: Type Name 50% 66% 75% 80% 90% 95% 98% 99% 99.9% 99.99% 100% # reqs (stats.py:819)
[2023-06-30 04:01:37,349 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2023-06-30 04:01:37,349 - INFO - fouram]: grpc search 12000 12000 12000 12000 13000 13000 13000 15000 18000 19000 19000 13348 (stats.py:819)
[2023-06-30 04:01:37,350 - INFO - fouram]: --------|--------------------------------------------------------------------------------|--------|------|------|------|------|------|------|------|------|------|------|------ (stats.py:819)
[2023-06-30 04:01:37,350 - INFO - fouram]: Aggregated 12000 12000 12000 12000 13000 13000 13000 15000 18000 19000 19000 13348 (stats.py:819)
[2023-06-30 04:01:37,350 - INFO - fouram]: (stats.py:820)
@weiliu1031
Not fully loaded
issue should now be resolved.
here comes two issues:
due to graceful stop policy in master branch has been broken, so during the rolling upgrade, it will causes shard leader unavailable. should be fixed by #25226
during rolling upgrade, if there isn't standby coord, it will cause unavailble for qc/dc/rc for a short period, which may effect the search/query, which is as expected.
/assign @wangting0128
please verify this
deployment mode: operator
argo task: fouramf-5wf5t-glzhh rollingupgrade argo task: fouramf-vtbtm
image: master-20230630-bc403dbd -> master-20230706-2ae6def3
server:
fouramf-5wf5t-glzhh-op-67-5375-milvus-datacoord-569f474fb-94s8f Running 0 3h2m 10.104.4.172 4am-node11
fouramf-5wf5t-glzhh-op-67-5375-milvus-datanode-84cdd75b54-svlcl Running 0 2h59m 10.104.22.83 4am-node26
fouramf-5wf5t-glzhh-op-67-5375-milvus-indexcoord-567f6d6556qvtk Running 0 3h1m 10.104.4.173 4am-node11
fouramf-5wf5t-glzhh-op-67-5375-milvus-indexnode-db876fd7-mnfcb Running 0 2h59m 10.104.4.176 4am-node11
fouramf-5wf5t-glzhh-op-67-5375-milvus-proxy-9c79d6c87-4bxpp Running 0 2h58m 10.104.21.118 4am-node24
fouramf-5wf5t-glzhh-op-67-5375-milvus-querycoord-d5994d986vqtmf Running 0 3h 10.104.4.174 4am-node11
fouramf-5wf5t-glzhh-op-67-5375-milvus-querynode-5c59ffcfb97vmhs Running 0 2h59m 10.104.6.168 4am-node13
fouramf-5wf5t-glzhh-op-67-5375-milvus-querynode-5c59ffcfb9zz74l Running 0 2h59m 10.104.4.175 4am-node11
fouramf-5wf5t-glzhh-op-67-5375-milvus-rootcoord-6ffcc5f44-vs6cf Running 0 3h3m 10.104.22.82 4am-node26
fouramf-5wf5t-glzhh-op-67-5375-etcd-0 Running 0 6h50m 10.104.17.183 4am-node23
fouramf-5wf5t-glzhh-op-67-5375-etcd-1 Running 0 6h50m 10.104.21.29 4am-node24
fouramf-5wf5t-glzhh-op-67-5375-etcd-2 Running 0 6h50m 10.104.6.128 4am-node13
fouramf-5wf5t-glzhh-op-67-5375-kafka-0 Running 2 6h50m 10.104.17.188 4am-node23
fouramf-5wf5t-glzhh-op-67-5375-kafka-1 Running 2 6h50m 10.104.21.34 4am-node24
fouramf-5wf5t-glzhh-op-67-5375-kafka-2 Running 2 6h50m 10.104.6.136 4am-node13
fouramf-5wf5t-glzhh-op-67-5375-kafka-zookeeper-0 Running 0 6h50m 10.104.17.187 4am-node23
fouramf-5wf5t-glzhh-op-67-5375-kafka-zookeeper-1 Running 0 6h50m 10.104.21.35 4am-node24
fouramf-5wf5t-glzhh-op-67-5375-kafka-zookeeper-2 Running 0 6h50m 10.104.6.137 4am-node13
fouramf-5wf5t-glzhh-op-67-5375-minio-0 Running 0 6h50m 10.104.17.184 4am-node23
fouramf-5wf5t-glzhh-op-67-5375-minio-1 Running 0 6h50m 10.104.21.31 4am-node24
fouramf-5wf5t-glzhh-op-67-5375-minio-2 Running 0 6h50m 10.104.6.129 4am-node13
fouramf-5wf5t-glzhh-op-67-5375-minio-3 Running 0 6h50m 10.104.22.19 4am-node26
client test result:
{'server': {'deploy_tool': 'operator',
'deploy_mode': 'cluster',
'config_name': 'cluster_2c2m',
'config': {'spec': {'components': {'queryNode': {'resources': {'limits': {'cpu': '4.0',
'memory': '64Gi'},
'requests': {'cpu': '3.0',
'memory': '33Gi'}},
'replicas': 2},
'indexNode': {'resources': {'limits': {'cpu': '8.0',
'memory': '16Gi'},
'requests': {'cpu': '5.0',
'memory': '9Gi'}},
'replicas': 1},
'dataNode': {'resources': {'limits': {'cpu': '2.0',
'memory': '4Gi'},
'requests': {'cpu': '2.0',
'memory': '3Gi'}},
'replicas': 1},
'image': 'harbor.milvus.io/milvus/milvus:master-20230630-bc403dbd'},
'mode': 'cluster',
'dependencies': {'etcd': {'inCluster': {'deletionPolicy': 'Delete',
'pvcDeletion': True,
'values': {'global': {'storageClass': 'local-path'},
'metrics': {'enabled': True,
'podMonitor': {'enabled': True}}}}},
'pulsar': {'inCluster': {'deletionPolicy': 'Delete',
'pvcDeletion': True,
'values': {'bookkeeper': {'volumes': {'journal': {'storageClassName': 'local-path'},
'ledgers': {'storageClassName': 'local-path'}}},
'zookeeper': {'volumes': {'data': {'storageClassName': 'local-path'}}}}}},
'kafka': {'inCluster': {'deletionPolicy': 'Delete',
'pvcDeletion': True,
'values': {'persistence': {'storageClass': 'local-path'}}}},
'storage': {'inCluster': {'deletionPolicy': 'Delete',
'pvcDeletion': True,
'values': {'persistence': {'storageClass': 'local-path'},
'metrics': {'podMonitor': {'enabled': True}}}}},
'msgStreamType': 'kafka'},
'config': {'log': {'level': 'debug'}}},
'apiVersion': 'milvus.io/v1beta1',
'kind': 'Milvus',
'metadata': {'name': 'fouram-op-79-6657'}},
'host': 'fouramf-5wf5t-glzhh-op-67-5375-milvus.qa-milvus',
'port': '19530',
'uri': ''},
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_100m_hnsw_ddl_dql_filter_kafka_cluster',
'test_case_params': {'dataset_params': {'metric_type': 'L2',
'dim': 128,
'dataset_name': 'sift',
'dataset_size': 100000000,
'ni_per': 50000},
'collection_params': {'other_fields': ['float_1'],
'shards_num': 2},
'load_params': {},
'query_params': {},
'search_params': {},
'resource_groups_params': {'reset': False},
'database_user_params': {'reset_rbac': False,
'reset_db': False},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 20,
'during_time': '4h',
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'search',
'weight': 20,
'params': {'nq': 10,
'top_k': 10,
'search_param': {'ef': 16},
'expr': {'float_1': {'GT': -1.0,
'LT': 50000000.0}},
'guarantee_timestamp': None,
'output_fields': None,
'ignore_growing': False,
'timeout': 60,
'random_data': True}},
{'type': 'query',
'weight': 10,
'params': {'ids': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9],
'expr': None,
'output_fields': None,
'ignore_growing': False,
'timeout': 60}},
{'type': 'load',
'weight': 1,
'params': {'replica_number': 1,
'timeout': 30}},
{'type': 'scene_test',
'weight': 2,
'params': {'dim': 128,
'data_size': 3000,
'nb': 3000,
'index_type': 'IVF_SQ8',
'index_param': {'nlist': 2048},
'metric_type': 'L2'}}]},
'run_id': 2023070697145337,
'datetime': '2023-07-06 02:15:14.781465',
'client_version': '2.2'},
'result': {'test_result': {'index': {'RT': 5563.4137},
'insert': {'total_time': 3152.0947,
'VPS': 31724.9352,
'batch_time': 1.576,
'batch': 50000},
'flush': {'RT': 2.5207},
'load': {'RT': 127.7297},
'Locust': {'Aggregated': {'Requests': 69711,
'Fails': 1344,
'RPS': 4.84,
'fail_s': 0.02,
'RT_max': 125165.16,
'RT_avg': 4119.79,
'TP50': 23,
'TP99': 66000.0},
'load': {'Requests': 2148,
'Fails': 32,
'RPS': 0.15,
'fail_s': 0.01,
'RT_max': 30308.24,
'RT_avg': 121.58,
'TP50': 6,
'TP99': 400.0},
'query': {'Requests': 20883,
'Fails': 377,
'RPS': 1.45,
'fail_s': 0.02,
'RT_max': 20433.55,
'RT_avg': 55.49,
'TP50': 6,
'TP99': 99},
'scene_test': {'Requests': 4354,
'Fails': 46,
'RPS': 0.3,
'fail_s': 0.01,
'RT_max': 125165.16,
'RT_avg': 64807.71,
'TP50': 65000.0,
'TP99': 72000.0},
'search': {'Requests': 42326,
'Fails': 889,
'RPS': 2.94,
'fail_s': 0.02,
'RT_max': 15885.63,
'RT_avg': 85.1,
'TP50': 30,
'TP99': 170.0}}}}}
search fails: client error log:
[2023-07-06 06:02:10,377 - INFO - fouram]: Aggregated 31 34 37 40 50 65000 65000 65000 76000 111000 118000 16842 (stats.py:819)
[2023-07-06 06:02:10,377 - INFO - fouram]: (stats.py:820)
[2023-07-06 06:02:19,864 - ERROR - fouram]: RPC error: [batch_insert], <MilvusException: (code=1, message=code: NotReadyServe, reason: stage=Abnormal: service not ready)>, <Time:{'RPC start': '2023-07-06 06:02:19.799164', 'RPC error': '2023-07-06 06:02:19.864387'}> (decorators.py:108)
[2023-07-06 06:02:19,866 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=code: NotReadyServe, reason: stage=Abnormal: service not ready)> (api_request.py:53)
[2023-07-06 06:02:19,866 - ERROR - fouram]: [CheckFunc] insert request check failed, response:<MilvusException: (code=1, message=code: NotReadyServe, reason: stage=Abnormal: service not ready)> (func_check.py:52)
[2023-07-06 06:02:19,867 - ERROR - fouram]: [func_time_catch] : (api_request.py:120)
[2023-07-06 06:02:30,383 - INFO - fouram]: (stats.py:820)
[2023-07-06 06:02:31,879 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=attempt #0: fail to Search, QueryNode ID=5, reason=err: failed to connect 10.104.21.42:21123, reason: context deadline exceeded
, /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:352 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:101 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]
/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:219 github.com/milvus-io/milvus/internal/distributed/querynode/client.(*Client).SearchSegments
/go/src/github.com/milvus-io/milvus/internal/querynodev2/cluster/worker.go:123 github.com/milvus-io/milvus/internal/querynodev2/cluster.(*remoteWorker).SearchSegments
/go/src/github.com/milvus-io/milvus/internal/querynodev2/delegator/delegator.go:249 github.com/milvus-io/milvus/internal/querynodev2/delegator.(*shardDelegator).Search.func2
/go/src/github.com/milvus-io/milvus/internal/querynodev2/delegator/delegator.go:434 github.com/milvus-io/milvus/internal/querynodev2/delegator.executeSubTasks[...].func1
/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit: channel=fouramf-5wf5t-glzhh-op-67-5375-rootcoord-dml_1_442658949782831325v1: fail to access shard delegator: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-07-06 06:02:19.868698', 'RPC error': '2023-07-06 06:02:31.879642'}> (decorators.py:108)
[2023-07-06 06:02:31,880 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=attempt #0: fail to Search, QueryNode ID=5, reason=err: failed to connect 10.104.21.42:21123, reason: context deadline exceeded
, /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:352 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:101 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]
/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:219 github.com/milvus-io/milvus/internal/distributed/querynode/client.(*Client).SearchSegments
/go/src/github.com/milvus-io/milvus/internal/querynodev2/cluster/worker.go:123 github.com/milvus-io/milvus/internal/querynodev2/cluster.(*remoteWorker).SearchSegments
/go/src/github.com/milvus-io/milvus/internal/querynodev2/delegator/delegator.go:249 github.com/milvus-io/milvus/internal/querynodev2/delegator.(*shardDelegator).Search.func2
/go/src/github.com/milvus-io/milvus/internal/querynodev2/delegator/delegator.go:434 github.com/milvus-io/milvus/internal/querynodev2/delegator.executeSubTasks[...].func1
/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit: channel=fouramf-5wf5t-glzhh-op-67-5375-rootcoord-dml_1_442658949782831325v1: fail to access shard delegator: fail to search on all shard leaders)> (api_request.py:53)
[2023-07-06 06:02:31,880 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=attempt #0: fail to Search, QueryNode ID=5, reason=err: failed to connect 10.104.21.42:21123, reason: context deadline exceeded
, /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace
/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:352 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:101 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]
/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:219 github.com/milvus-io/milvus/internal/distributed/querynode/client.(*Client).SearchSegments
/go/src/github.com/milvus-io/milvus/internal/querynodev2/cluster/worker.go:123 github.com/milvus-io/milvus/internal/querynodev2/cluster.(*remoteWorker).SearchSegments
/go/src/github.com/milvus-io/milvus/internal/querynodev2/delegator/delegator.go:249 github.com/milvus-io/milvus/internal/querynodev2/delegator.(*shardDelegator).Search.func2
/go/src/github.com/milvus-io/milvus/internal/querynodev2/delegator/delegator.go:434 github.com/milvus-io/milvus/internal/querynodev2/delegator.executeSubTasks[...].func1
/usr/local/go/src/runtime/asm_amd64.s:1571 runtime.goexit: channel=fouramf-5wf5t-glzhh-op-67-5375-rootcoord-dml_1_442658949782831325v1: fail to access shard delegator: fail to search on all shard leaders)> (func_check.py:46)
[2023-07-06 06:02:37,344 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)>, <Time:{'RPC start': '2023-07-06 06:02:31.882265', 'RPC error': '2023-07-06 06:02:37.344561'}> (decorators.py:108)
[2023-07-06 06:02:37,345 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (api_request.py:53)
[2023-07-06 06:02:37,345 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to search on all shard leaders)> (func_check.py:46)
[2023-07-06 06:02:37,349 - ERROR - fouram]: RPC error: [query], <MilvusException: (code=1, message=attempt #0: fail to get shard leaders from QueryCoord: stage=Initializing: service not ready: unrecoverable error: fail to query on all shard leaders)>, <Time:{'RPC start': '2023-07-06 06:02:37.345428', 'RPC error': '2023-07-06 06:02:37.349563'}> (decorators.py:108)
/assign @aoiasd
/assign @elstic please verify this
/assign @elstic
The issue still exists.
deployment mode: operator argo task : fouramf-2hrlt rollingupgrade argo task: fouramf-6427s image : master-20230721-32827f53 -> master-20230726-2b9ec565 search concurrency: 20
server:
fouramf-2hrlt-op-52-9084-milvus-datacoord-66895b767-d4ktr Running 0 39.96429s 10.104.24.150 4am-node29
fouramf-2hrlt-op-52-9084-milvus-datanode-85b68997d7-lfvrf Running 0 39.964343s 10.104.23.75 4am-node27
fouramf-2hrlt-op-52-9084-milvus-indexcoord-6567988f55-8mg9f Running 0 39.964362s 10.104.24.151 4am-node29
fouramf-2hrlt-op-52-9084-milvus-indexnode-bffff4744-pwlng Running 0 39.964378s 10.104.24.153 4am-node29
fouramf-2hrlt-op-52-9084-milvus-proxy-59b9f655bb-glwg9 Running 0 39.964393s 10.104.23.74 4am-node27
fouramf-2hrlt-op-52-9084-milvus-querycoord-5f69d6b59-jv4pd Running 0 39.964408s 10.104.23.73 4am-node27
fouramf-2hrlt-op-52-9084-milvus-querynode-5957f78754-ddxrj Running 0 39.964422s 10.104.23.76 4am-node27
fouramf-2hrlt-op-52-9084-milvus-querynode-5957f78754-jcv7v Running 0 39.964436s 10.104.20.186 4am-node22
fouramf-2hrlt-op-52-9084-milvus-rootcoord-6fb5bd55cf-twgdr Running 0 39.96445s 10.104.23.77 4am-node27
fouramf-2hrlt-op-52-9084-etcd-0 Running 0 3m 10.104.24.139 4am-node29
fouramf-2hrlt-op-52-9084-etcd-1 Running 0 3m 10.104.23.65 4am-node27
fouramf-2hrlt-op-52-9084-etcd-2 Running 0 3m 10.104.16.192 4am-node21
fouramf-2hrlt-op-52-9084-kafka-0 Running 2 3m 10.104.24.144 4am-node29
fouramf-2hrlt-op-52-9084-kafka-1 Running 1 3m 10.104.23.70 4am-node27
fouramf-2hrlt-op-52-9084-kafka-2 Running 1 3m 10.104.20.180 4am-node22
fouramf-2hrlt-op-52-9084-kafka-zookeeper-0 Running 0 3m 10.104.23.69 4am-node27
fouramf-2hrlt-op-52-9084-kafka-zookeeper-1 Running 0 3m 10.104.24.146 4am-node29
fouramf-2hrlt-op-52-9084-kafka-zookeeper-2 Running 0 3m 10.104.20.181 4am-node22
fouramf-2hrlt-op-52-9084-minio-0 Running 0 3m 10.104.23.60 4am-node27
fouramf-2hrlt-op-52-9084-minio-1 Running 0 3m 10.104.24.138 4am-node29
fouramf-2hrlt-op-52-9084-minio-2 Running 0 3m 10.104.20.176 4am-node22
fouramf-2hrlt-op-52-9084-minio-3 Running 0 3m 10.104.16.191 4am-node21
2023-07-26 03:41:05 ~ 03:41:46 search fail There was nearly a minute of search failure. The total number of failures is 571.
client error log:
Use operator deployment and change the enableActiveStandby value to true at deployment time.
rootCoord:
enableActiveStandby: true
dataCoord:
enableActiveStandby: true
queryCoord:
enableActiveStandby: true
indexCoord:
enableActiveStandby: true
argo task : fouramf-b5p7h rollingupgrade argo task: fouramf-tpztc image : master-20230721-32827f53 -> master-20230727-b986e3af search concurrency: 20 It takes 24 seconds to search properly.
server:
fouramf-b5p7h-op-36-1412-etcd-0 1/1 Running 0 3h17m 10.104.16.212 4am-node21 <none> <none>
fouramf-b5p7h-op-36-1412-etcd-1 1/1 Running 0 3h17m 10.104.15.65 4am-node20 <none> <none>
fouramf-b5p7h-op-36-1412-etcd-2 1/1 Running 0 3h17m 10.104.18.126 4am-node25 <none> <none>
fouramf-b5p7h-op-36-1412-kafka-0 1/1 Running 2 (3h16m ago) 3h17m 10.104.16.216 4am-node21 <none> <none>
fouramf-b5p7h-op-36-1412-kafka-1 1/1 Running 2 (3h16m ago) 3h17m 10.104.18.131 4am-node25 <none> <none>
fouramf-b5p7h-op-36-1412-kafka-2 1/1 Running 1 (3h16m ago) 3h17m 10.104.15.75 4am-node20 <none> <none>
fouramf-b5p7h-op-36-1412-kafka-zookeeper-0 1/1 Running 0 3h17m 10.104.16.217 4am-node21 <none> <none>
fouramf-b5p7h-op-36-1412-kafka-zookeeper-1 1/1 Running 0 3h17m 10.104.18.132 4am-node25 <none> <none>
fouramf-b5p7h-op-36-1412-kafka-zookeeper-2 1/1 Running 0 3h17m 10.104.15.79 4am-node20 <none> <none>
fouramf-b5p7h-op-36-1412-milvus-datacoord-74cbdfd686-wdwv8 1/1 Running 0 146m 10.104.6.153 4am-node13 <none> <none>
fouramf-b5p7h-op-36-1412-milvus-datanode-666c49f9d-tdcsk 1/1 Running 0 143m 10.104.18.192 4am-node25 <none> <none>
fouramf-b5p7h-op-36-1412-milvus-indexcoord-768b8fc474-fhrq7 1/1 Running 0 145m 10.104.6.156 4am-node13 <none> <none>
fouramf-b5p7h-op-36-1412-milvus-indexnode-674bc96f7-xgkvl 1/1 Running 0 143m 10.104.6.158 4am-node13 <none> <none>
fouramf-b5p7h-op-36-1412-milvus-proxy-694fcfc67c-wkwpz 1/1 Running 1 (42m ago) 142m 10.104.19.163 4am-node28 <none> <none>
fouramf-b5p7h-op-36-1412-milvus-querycoord-667cf8cb58-pz7w2 1/1 Running 0 144m 10.104.6.157 4am-node13 <none> <none>
fouramf-b5p7h-op-36-1412-milvus-querynode-5fbf54d7ff-7x8xs 1/1 Running 0 143m 10.104.23.65 4am-node27 <none> <none>
fouramf-b5p7h-op-36-1412-milvus-querynode-5fbf54d7ff-qcspb 1/1 Running 0 143m 10.104.4.228 4am-node11 <none> <none>
fouramf-b5p7h-op-36-1412-milvus-rootcoord-6d9cc89f4b-mkfs8 1/1 Running 0 147m 10.104.6.151 4am-node13 <none> <none>
fouramf-b5p7h-op-36-1412-minio-0 1/1 Running 0 3h17m 10.104.16.213 4am-node21 <none> <none>
fouramf-b5p7h-op-36-1412-minio-1 1/1 Running 0 3h17m 10.104.18.128 4am-node25 <none> <none>
fouramf-b5p7h-op-36-1412-minio-2 1/1 Running 0 3h17m 10.104.15.68 4am-node20 <none> <none>
fouramf-b5p7h-op-36-1412-minio-3 1/1 Running 0 3h17m 10.104.5.63 4am-node12 <none> <none>
client error log:
datacoord unavailable during rolling upgrade https://github.com/milvus-io/milvus/issues/25648#issuecomment-1664936274
close for now , and will be tracked in #25648
Is there an existing issue for this?
Environment
Current Behavior
deploy server argo task: fouramf-hdbkc client argo task: fouramf-94rlz rolling upgrade argo task: fouramf-rkpqg
test configs:
--milvus_tag=master-20230619-a6310050 -k test_concurrent_locust_custom_parameters -s --deploy_retain --case_skip_clean_collection --deploy_mode=cluster --release_name=lb-op-upgrade
rolling upgrade image from master-20230619-a6310050 to master-20230620-af1d84e5
server:
client test result:
Expected Behavior
Search does not fail during rolling upgrade
Steps To Reproduce
Milvus Log
No response
Anything else?
fouramf-server-op-replicas-lb-3qn:
fouramf-client-sift-replica3-search: