milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.49k stars 2.83k forks source link

Concurrent queries cause memory overflow[Bug]: #19865

Closed zhaohobby closed 1 year ago

zhaohobby commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version:2.1.4
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2): 2.1.4
- OS(Ubuntu or CentOS): centos
- CPU/Memory: 8c/16g
- GPU: none
- Others:

Current Behavior

3000完用户向量并发查询导致querynode docker内存溢出重启。数据无法重新加载(成功加载过2次,其它都失败了)

Expected Behavior

并发查询单个collection后内存不应该成倍数增长。导致内存溢出重启。即便重启数据也应该可以从新加载成功。

Steps To Reproduce

1、导入数据到mivus
2、加载需要查询计划
3、开启多线程模拟并发请求已经开发好的请求milvus的程序。

Milvus Log

[2022/10/18 07:34:33.457 +00:00] [WARN] [proxy/task_search.go:457] ["QueryNode search result error"] [msgID=436748608790594288] [nodeID=84] [reason="Search 83 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:33.457 +00:00] [WARN] [proxy/task_policies.go:57] ["fail to Query with shard leader"] [nodeID=84] [error="fail to Search, QueryNode ID=84, reason=Search 83 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:33.457 +00:00] [WARN] [proxy/task_policies.go:65] ["no shard leaders available"] [leaders="[]"] [error="fail to Search, QueryNode ID=84, reason=Search 83 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:33.457 +00:00] [WARN] [proxy/task_search.go:467] ["fail to search to all shard leaders"] [msgID=436748608790594288] ["shard leaders"="[{}]"] [2022/10/18 07:34:33.457 +00:00] [DEBUG] [timerecord/time_recorder.go:78] ["proxy execute search 436748608790594288: done (8617ms)"] [2022/10/18 07:34:33.457 +00:00] [ERROR] [proxy/task_scheduler.go:468] ["Failed to execute task: fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 83 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [traceID=6c30b763c2363b94] [stack="github.com/milvus-io/milvus/internal/proxy.(taskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:468"] [2022/10/18 07:34:33.457 +00:00] [WARN] [proxy/impl.go:2492] ["Search failed to WaitToFinish"] [error="fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 83 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [traceID=6c30b763c2363b94] [role=proxy] [msgID=436748608790594288] [db=] [collection=lookalike02] [partitions="[]"] [dsl=] [len(PlaceholderGroup)=12959] [OutputFields="[id]"] [search_params="[{\"key\":\"anns_field\",\"value\":\"vector\"},{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"metric_type\",\"value\":\"IP\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"params\",\"value\":\"{\\"nprobe\\":25000}\"}]"] [travel_timestamp=0] [guarantee_timestamp=2] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_search.go:457] ["QueryNode search result error"] [msgID=436748608790594291] [nodeID=84] [reason="Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_policies.go:57] ["fail to Query with shard leader"] [nodeID=84] [error="fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_search.go:457] ["QueryNode search result error"] [msgID=436748608790594290] [nodeID=84] [reason="Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_policies.go:65] ["no shard leaders available"] [leaders="[]"] [error="fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_policies.go:57] ["fail to Query with shard leader"] [nodeID=84] [error="fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_policies.go:65] ["no shard leaders available"] [leaders="[]"] [error="fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_search.go:467] ["fail to search to all shard leaders"] [msgID=436748608790594291] ["shard leaders"="[{}]"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_search.go:467] ["fail to search to all shard leaders"] [msgID=436748608790594290] ["shard leaders"="[{}]"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_search.go:457] ["QueryNode search result error"] [msgID=436748608790594292] [nodeID=84] [reason="Search 84 failed, reason search context timeout err %!w()"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_policies.go:57] ["fail to Query with shard leader"] [nodeID=84] [error="fail to Search, QueryNode ID=84, reason=Search 84 failed, reason search context timeout err %!w()"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_policies.go:65] ["no shard leaders available"] [leaders="[]"] [error="fail to Search, QueryNode ID=84, reason=Search 84 failed, reason search context timeout err %!w()"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_search.go:467] ["fail to search to all shard leaders"] [msgID=436748608790594292] ["shard leaders"="[{}]"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_search.go:457] ["QueryNode search result error"] [msgID=436748608790594289] [nodeID=84] [reason="Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:41.217 +00:00] [DEBUG] [timerecord/time_recorder.go:78] ["proxy execute search 436748608790594290: done (16376ms)"] [2022/10/18 07:34:41.217 +00:00] [DEBUG] [timerecord/time_recorder.go:78] ["proxy execute search 436748608790594292: done (9547ms)"] [2022/10/18 07:34:41.217 +00:00] [ERROR] [proxy/task_scheduler.go:468] ["Failed to execute task: fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [traceID=5aa9843b85515fb2] [stack="github.com/milvus-io/milvus/internal/proxy.(taskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:468"] [2022/10/18 07:34:41.217 +00:00] [DEBUG] [timerecord/time_recorder.go:78] ["proxy execute search 436748608790594291: done (16375ms)"] [2022/10/18 07:34:41.217 +00:00] [ERROR] [proxy/task_scheduler.go:468] ["Failed to execute task: fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 84 failed, reason search context timeout err %!w()"] [traceID=23971ed8fb70cbe] [stack="github.com/milvus-io/milvus/internal/proxy.(taskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:468"] [2022/10/18 07:34:41.217 +00:00] [ERROR] [proxy/task_scheduler.go:468] ["Failed to execute task: fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [traceID=6155fab71746010d] [stack="github.com/milvus-io/milvus/internal/proxy.(taskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:468"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_policies.go:57] ["fail to Query with shard leader"] [nodeID=84] [error="fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/impl.go:2492] ["Search failed to WaitToFinish"] [error="fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [traceID=5aa9843b85515fb2] [role=proxy] [msgID=436748608790594290] [db=] [collection=lookalike02] [partitions="[]"] [dsl=] [len(PlaceholderGroup)=12959] [OutputFields="[id]"] [search_params="[{\"key\":\"anns_field\",\"value\":\"vector\"},{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"metric_type\",\"value\":\"IP\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"params\",\"value\":\"{\\"nprobe\\":25000}\"}]"] [travel_timestamp=0] [guarantee_timestamp=2] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_policies.go:65] ["no shard leaders available"] [leaders="[]"] [error="fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/impl.go:2492] ["Search failed to WaitToFinish"] [error="fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [traceID=6155fab71746010d] [role=proxy] [msgID=436748608790594291] [db=] [collection=lookalike02] [partitions="[]"] [dsl=] [len(PlaceholderGroup)=12959] [OutputFields="[id]"] [search_params="[{\"key\":\"anns_field\",\"value\":\"vector\"},{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"metric_type\",\"value\":\"IP\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"params\",\"value\":\"{\\"nprobe\\":25000}\"}]"] [travel_timestamp=0] [guarantee_timestamp=2] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/task_search.go:467] ["fail to search to all shard leaders"] [msgID=436748608790594289] ["shard leaders"="[{}]"] [2022/10/18 07:34:41.217 +00:00] [WARN] [proxy/impl.go:2492] ["Search failed to WaitToFinish"] [error="fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 84 failed, reason search context timeout err %!w()"] [traceID=23971ed8fb70cbe] [role=proxy] [msgID=436748608790594292] [db=] [collection=lookalike02] [partitions="[]"] [dsl=] [len(PlaceholderGroup)=12959] [OutputFields="[id]"] [search_params="[{\"key\":\"anns_field\",\"value\":\"vector\"},{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"metric_type\",\"value\":\"IP\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"params\",\"value\":\"{\\"nprobe\\":25000}\"}]"] [travel_timestamp=0] [guarantee_timestamp=2] [2022/10/18 07:34:41.217 +00:00] [DEBUG] [timerecord/time_recorder.go:78] ["proxy execute search 436748608790594289: done (16377ms)"] [2022/10/18 07:34:41.217 +00:00] [ERROR] [proxy/task_scheduler.go:468] ["Failed to execute task: fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [traceID=3e4176def60a8c0e] [stack="github.com/milvus-io/milvus/internal/proxy.(taskScheduler).processTask\n\t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:468"] [2022/10/18 07:34:41.218 +00:00] [WARN] [proxy/impl.go:2492] ["Search failed to WaitToFinish"] [error="fail to search on all shard leaders, err=fail to Search, QueryNode ID=84, reason=Search 82 failed, reason err err: rpc error: code = Canceled desc = context canceled\n, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:232 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase).Call\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Search\n/go/src/github.com/milvus-io/milvus/internal/querynode/shard_cluster.go:824 github.com/milvus-io/milvus/internal/querynode.(*ShardCluster).Search.func2\n/usr/local/go/src/runtime/asm_amd64.s:1371 runtime.goexit\n"] [traceID=3e4176def60a8c0e] [role=proxy] [msgID=436748608790594289] [db=] [collection=lookalike02] [partitions="[]"] [dsl=] [len(PlaceholderGroup)=12959] [OutputFields="[id]"] [search_params="[{\"key\":\"anns_field\",\"value\":\"vector\"},{\"key\":\"topk\",\"value\":\"100\"},

Anything else?

querynode重启后数据并没有恢复

JackLCL commented 1 year ago

@zhaohobby Can you use this tool to calculate whether the memory is sufficient?

yanliang567 commented 1 year ago

@zhaohobby did you have any insert or delete operations during searching

xiaofan-luan commented 1 year ago

vector

Did we have test with output specified?

zhaohobby commented 1 year ago

@zhaohobby did you have any insert or delete operations during searching

没有,

@zhaohobby did you have any insert or delete operations during searching no

zhaohobby commented 1 year ago

milvus.zip milvus 并发请求后qn节点内存增长很大,导致单个请求也内存溢出

yanliang567 commented 1 year ago

/assign @jiaoew1991 /unasign @zhaohobby

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.