Closed Richard-lrg closed 1 year ago
@Cactus-L quick questions:
@Cactus-L quick questions:
- during that slow response period, what requests are running to Milvus? any insert or delete requests?
- does your milvus running on the exclusive hosts? it helps us to understand if there are any resource competitions at that moment?
- do you happen to have any screenshot of milvus metrics on grafana? It helps us to know what was happen in proxy, querynode and runtime.
- Could you please refer this doc to export the whole Milvus logs for investigation? /assign @Cactus-L
if there are insert/delete requests are sent to Miluvs during that periods, it is expected as the new inserted data will be search by brute force search. please feel free to let us know if any updates
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
We switched to the milvus cluster version of the service as a whole last Friday, but today we found that when we request milvus, there are always periodic slow responses.
DEADLINE_EXCEEDED: deadline exceeded after 19.999962569s
So I checked the logs and found that there are many such logs on the milvus-proxy
[2023/08/01 13:28:32.332 +00:00] [WARN] [proxy/task_search.go:439] ["first search failed, updating shardleader caches and retry search"] [traceID=546290382f5b1ad7] [msgId=443258370085617665] [error="All attempts results:\nattempt #1:context canceled\n"] [2023/08/01 13:28:32.332 +00:00] [INFO] [proxy/meta_cache.go:836] ["clearing shard cache for collection"] [collectionName=xxx] [2023/08/01 13:28:32.332 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="All attempts results:\nattempt #1:context canceled\n"] [2023/08/01 13:28:32.332 +00:00] [WARN] [proxy/task_scheduler.go:473] ["Failed to execute task: "] [error="fail to search on all shard leaders, err=All attempts results:\nattempt #1:All attempts results:\nattempt #1:context canceled\n\nattempt #2:context canceled\n"] [traceID=546290382f5b1ad7]
"expire all shard leader cache" Such logs are very frequent, why is this happening? Is the periodic slow response caused by the cache being freed and then reloaded.
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
[2023/08/01 13:28:32.332 +00:00] [WARN] [proxy/task_search.go:439] ["first search failed, updating shardleader caches and retry search"] [traceID=546290382f5b1ad7] [msgId=443258370085617665] [error="All attempts results:\nattempt #1:context canceled\n"] [2023/08/01 13:28:32.332 +00:00] [INFO] [proxy/meta_cache.go:836] ["clearing shard cache for collection"] [collectionName=xxx] [2023/08/01 13:28:32.332 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="All attempts results:\nattempt #1:context canceled\n"] [2023/08/01 13:28:32.332 +00:00] [WARN] [proxy/task_scheduler.go:473] ["Failed to execute task: "] [error="fail to search on all shard leaders, err=All attempts results:\nattempt #1:All attempts results:\nattempt #1:context canceled\n\nattempt #2:context canceled\n"] [traceID=546290382f5b1ad7]
Anything else?
No response