milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.75k stars 2.93k forks source link

As the number of collections grows CPU usage of checkhealth request also rises #35563

Open jaime0815 opened 3 months ago

jaime0815 commented 3 months ago

Is there an existing issue for this?

Environment

- Milvus version: master
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The cluster has 450 collections.

image

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

xiaofan-luan commented 3 months ago

@jaime0815 can we do one check health on all the datanode, instead of checking everything here at datacoord? I believe this will save many cpus

jaime0815 commented 3 months ago

@jaime0815 can we do one check health on all the datanode, instead of checking everything here at datacoord? I believe this will save many cpus

Yes, we also will eliminate the GetMetrics request dependency and optimize the overhead from unnecessary log.With() invocations.

jaime0815 commented 3 months ago

The rate limiter mechanism will consume significant CPU resources as the number of collections increases, especially due to the high CPU usage from GetMetrics requests, which will need optimization in the future.

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.