Open Infinite666 opened 1 month ago
@Infinite666 could you please attach the milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.
/assign @Infinite666
1.please upgrade to 2.3.22. for the current log this is due to slow in describing index and this is most likely related to too many collections 2.please offer all logs for server side for invesigation.
@Infinite666 could you please attach the milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.
/assign @Infinite666
Sorry I can't upload the logs for now due to our company's policy, but I will try to find a way to upload the logs later.
We have 10 databases and 90 collections, about 750w+ entities, the size of milvus volumn on disk is about 54G. Are the collections and data size too much for v2.3.5 standalone milvus? We also met another error yesterday, here is the error log raised by service using milvus java sdk to search (but I lost the log of milvus since I removed the old container):
[2024-10-13 17:19:43.110] [http-nio-12756-exec-222] [(ManagedChannelOrphanWrapper.java:159)] - *~*~*~ Channel ManagedChannelImpl{logId=146626, target=xxx:19530} was not shutdown properly!!! ~*~*~*
Make sure to call shutdown()/shutdownNow() and wait until awaitTermination() returns true.
java.lang.RuntimeException: ManagedChannel allocation site
Two of our collection has more than 200w+ entities, seems like we have met lots of problems after we creating these two big collections, is the data size too much?
200w is not huge. as long as you are not frequently flush and delete.
we need server log to do further investigation For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs. for embedded docker use docker log
Here is the server log. Sorry, due to our company's policy, I can only upload the logs in multipart zips in my own repo: https://github.com/Infinite666/temp you can visit and download it from my repo, thanks a lot
I think your etcd service is too slow. please make sure it is running on SSD volumes
milvus-etcd | {"level":"warn","ts":"2024-10-14T02:35:48.879Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"3.257843554s","expected-duration":"100ms","prefix":"","request":"header:<ID:7587882063776415505 > put:<key:\"by-dev/kv/gid/timestamp\" value_size:8 >","response":"size:6"}
I just checked our device, the volume directory does using a HDD disk, so this is the root cause of the problem? Why did the service back to available after some time, this situation will only happen for a short period right after restart? We also came into this kind of error before (at that time the milvus service has already run for a couple days), is it also because of the HDD disk?
yes, please try to change SDD voumes. If milvus lost the heart-beating connection with etcd, it fails.
Thanks for reply, we will try to use SSD volumes to see if we can get rid of this problem. Besides, I know HDD may slow down the process of etcd, but I wonder why will it lead to heart-beating lose between etcd and milvus?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
The first time I started the milvus with docker-compose, the milvus container exited, so I tried to restart around 2024/10/14 02:43:08.377. After restarting milvus and making sure the docker status is healthy, I used Attu to connect to milvus and it was very slow to show the collections. We try to use sdk to search and got "SearchRequest RPC failed!", the connection is refused by milvus: "Caused by: java.net.ConnectException: Connection refuse". After about 5-10 minutes, the service seems recover. I checked the logs of milvus after 02:43:08, and saw many operations failed because of "context canceled". Such as:
So I have two questions:
Expected Behavior
When all the milvus containers are on the healthy status, the service should be working properly.
Steps To Reproduce
No response
Milvus Log
Sorry, due to our company's policy, I can only upload the logs in multipart zips in my own repo: https://github.com/Infinite666/temp
Anything else?
No response