milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.36k stars 2.91k forks source link

[Bug]: Search latency becomes much larger than before after reinstallation or upgrading when Milvus mq is Kafka #20212

Closed zhuwenxing closed 1 year ago

zhuwenxing commented 2 years ago

Is there an existing issue for this?

Environment

- Milvus version: 2.1.4 -> master-20221031-2bfecf5b
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2022-10-31T10:18:15.308Z] ###########

[2022-10-31T10:18:15.308Z] collection name: task_1_IVF_FLAT

[2022-10-31T10:18:15.308Z] load collection

[2022-10-31T10:18:15.308Z] load time: 6.6114

[2022-10-31T10:18:15.308Z] {'metric_type': 'L2', 'params': {'nprobe': 10}}

[2022-10-31T10:18:15.308Z] 

[2022-10-31T10:18:15.308Z] Search...

[2022-10-31T10:18:15.308Z] (distance: 30.569828033447266, id: 563) -11.0

[2022-10-31T10:18:15.308Z] (distance: 31.539844512939453, id: 973) -16.0

[2022-10-31T10:18:15.308Z] (distance: 32.50592803955078, id: 1153) -18.0

[2022-10-31T10:18:15.308Z] (distance: 32.571937561035156, id: 2459) -13.0

[2022-10-31T10:18:15.308Z] (distance: 32.915000915527344, id: 2280) -17.0

[2022-10-31T10:18:15.308Z] [563, 973, 1153, 2459, 2280]

[2022-10-31T10:18:15.308Z] search latency: 104.9578s

[2022-10-31T10:18:15.308Z] Traceback (most recent call last):

[2022-10-31T10:18:15.308Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler

[2022-10-31T10:18:15.308Z]     return func(self, *args, **kwargs)

[2022-10-31T10:18:15.308Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 889, in query

[2022-10-31T10:18:15.308Z]     response = future.result()

[2022-10-31T10:18:15.308Z]   File "/usr/local/lib/python3.8/dist-packages/grpc/_channel.py", line 744, in result

[2022-10-31T10:18:15.308Z]     raise self

[2022-10-31T10:18:15.308Z] grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:

[2022-10-31T10:18:15.308Z]  status = StatusCode.DEADLINE_EXCEEDED

[2022-10-31T10:18:15.308Z]  details = "Deadline Exceeded"

[2022-10-31T10:18:15.308Z]  debug_error_string = "{"created":"@1667211487.663885543","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"

[2022-10-31T10:18:15.308Z] >

[2022-10-31T10:18:15.309Z] 

[2022-10-31T10:18:15.309Z] The above exception was the direct cause of the following exception:

[2022-10-31T10:18:15.309Z] 

[2022-10-31T10:18:15.309Z] Traceback (most recent call last):

[2022-10-31T10:18:15.309Z]   File "scripts/action_after_upgrade.py", line 107, in <module>

[2022-10-31T10:18:15.309Z]     task_1(data_size, host)

[2022-10-31T10:18:15.309Z]   File "scripts/action_after_upgrade.py", line 20, in task_1

[2022-10-31T10:18:15.309Z]     load_and_search(prefix)

[2022-10-31T10:18:15.309Z]   File "/home/jenkins/agent/workspace/tests/python_client/deploy/scripts/utils.py", line 212, in load_and_search

[2022-10-31T10:18:15.309Z]     res = c.query(expr, output_fields, timeout=20)

[2022-10-31T10:18:15.309Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 807, in query

[2022-10-31T10:18:15.309Z]     res = conn.query(self._name, expr, output_fields, partition_names, timeout=timeout, schema=schema, **kwargs)

[2022-10-31T10:18:15.309Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler

[2022-10-31T10:18:15.309Z]     raise e

[2022-10-31T10:18:15.309Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler

[2022-10-31T10:18:15.309Z]     return func(*args, **kwargs)

[2022-10-31T10:18:15.309Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler

[2022-10-31T10:18:15.309Z]     ret = func(self, *args, **kwargs)

[2022-10-31T10:18:15.309Z]   File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 64, in handler

[2022-10-31T10:18:15.309Z]     raise MilvusException(message=f"rpc deadline exceeded: {timeout_msg}") from e

[2022-10-31T10:18:15.309Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 20s)>

script returned exit code 1

image

Expected Behavior

The performance should be the same as before or better

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka/detail/deploy_test_kafka/401/pipeline

log: artifacts-cluster-upgrade-401-server-logs.tar.gz artifacts-cluster-upgrade-401-pytest-logs.tar.gz

Anything else?

No response

zhuwenxing commented 2 years ago

This issue only reproduced in cluster mode. It works well in standalone mode. image

yanliang567 commented 2 years ago

/assign @jaime0815 /unassign

zhuwenxing commented 2 years ago

The performance of the query also degrades a lot.

jaime0815 commented 2 years ago

update tSafe is too slow

image

jaime0815 commented 2 years ago

tsafe updating is abnormal after upgrading, fsafe min value is less than before.

image

zhuwenxing commented 2 years ago

reinstall with master-20221102-51acba7d failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka/detail/deploy_test_kafka/420/pipeline log: artifacts-kafka-cluster-reinstall-420-pytest-logs.tar.gz artifacts-kafka-cluster-reinstall-420-server-first-deployment-logs.tar.gz artifacts-kafka-cluster-reinstall-420-server-second-deployment-logs.tar.gz

image

zhuwenxing commented 2 years ago

v2.1.4--> master-20221109-cc9dc0f0 failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka/detail/deploy_test_kafka/459/pipeline image

But for a collection, the latency is normal image

jaime0815 commented 1 year ago

resolved by https://github.com/milvus-io/milvus/pull/20542 https://github.com/milvus-io/milvus/pull/20597

jaime0815 commented 1 year ago

/assign @zhuwenxing