Open sabrahul opened 1 week ago
@sabrahul it sounds like a timeout issue. Could you please refer this doc to export the whole Milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.
Also please attach the etcd backup for investigation? Check this: https://github.com/milvus-io/birdwatcher for details about how to backup etcd with birdwatcher
/assign @sabrahul /unassign
Is there an existing issue for this?
Summary
Intermittent Query Failure in Milvus
Hi, I am new to Milvus and vector databases, and I've encountered a recurring issue that I haven't been able to resolve despite multiple attempts. I'm hoping someone can provide guidance on the correct approach or potential workarounds.
I've been troubleshooting this for the past two days. The Milvus container starts up correctly, along with ETCD, MinIO, and Attu. However, when I try to run a query, I frequently encounter the error shown in the logs above. Surprisingly, the same query sometimes works as expected, but other times it fails with no clear indication of what’s causing the inconsistency.
I’ve used the default configurations in my milvus.yaml file and have tried modifying various settings, but none of the changes have resolved the issue. I would appreciate any insights on what configurations I should focus on or if there are any known workarounds for this problem.
Thank you in advance for your support.
Environment
Current Behavior
When I am trying to query the db or collection, I am getting context deadline exceeded errors continuously. But sometimes it did works that I don't know how.
Expected Behavior
When I query it should return results properly.
Steps To Reproduce
Milvus Log
2024-09-16 15:39:24 [2024/09/16 10:09:24.729 +00:00] [WARN] [proxy/impl.go:3478] ["Query failed to WaitToFinish"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [role=proxy] [db=default]
[collection=local_009_bot_knowledge_base_embeddings] [partitions="[]"] [ConsistencyLevel=Bounded] [useDefaultConsistency=false] [error="proxy TaskCondition context Done: context deadline exceeded"] [errorVerbose="proxy TaskCondition context Done: context deadline exceeded\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(TaskCondition).WaitToFinish\n | \t/workspace/source/internal/proxy/condition.go:45\n | github.com/milvus-io/milvus/internal/proxy.(Proxy).query\n | \t/workspace/source/internal/proxy/impl.go:3477\n | github.com/milvus-io/milvus/internal/proxy.(Proxy).Query\n | \t/workspace/source/internal/proxy/impl.go:3568\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).Query\n | \t/workspace/source/internal/distributed/proxy/service.go:954\n | github.com/milvus-io/milvus-proto/go-api/v2/milvuspb._MilvusService_Query_Handler.func1\n | \t/go/pkg/mod/github.com/milvus-io/milvus-proto/go-api/v2@v2.4.11/milvuspb/milvus_grpc.pb.go:2063\n | github.com/milvus-io/milvus/internal/proxy/connection.KeepActiveInterceptor\n | \t/workspace/source/internal/proxy/connection/util.go:60\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/proxy.TraceLogInterceptor\n | \t/workspace/source/internal/proxy/trace_log_interceptor.go:36\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/proxy/accesslog.UnaryUpdateAccessInfoInterceptor\n | \t/workspace/source/internal/proxy/accesslog/util.go:47\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.RateLimitInterceptor.func5\n | \t/workspace/source/internal/proxy/rate_limit_interceptor.go:56\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/pkg/util/logutil.UnaryTraceLoggerInterceptor\n | \t/workspace/source/pkg/util/logutil/grpc_interceptor.go:23\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.UnaryServerInterceptor.func4\n | \t/workspace/source/internal/proxy/privilege_interceptor.go:112\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/proxy.HookInterceptor\n | \t/workspace/source/internal/proxy/hook_interceptor.go:49\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.UnaryServerHookInterceptor.func3\n | \t/workspace/source/internal/proxy/hook_interceptor.go:19\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.DatabaseInterceptor.func2\n | \t/workspace/source/internal/proxy/database_interceptor.go:15\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.UnaryServerInterceptor.func1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/auth/auth.go:47\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/proxy/accesslog.UnaryAccessLogInterceptor\n | \t/workspace/source/internal/proxy/accesslog/util.go:38\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34\n | github.com/milvus-io/milvus-proto/go-api/v2/milvuspb._MilvusService_Query_Handler\n | \t/go/pkg/mod/github.com/milvus-io/milvus-proto/go-api/v2@v2.4.11/milvuspb/milvus_grpc.pb.go:2065\n | google.golang.org/grpc.(Server).processUnaryRPC\n | \t/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1343\n | google.golang.org/grpc.(Server).handleStream\n | \t/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1737\n | google.golang.org/grpc.(Server).serveStreams.func1.1\n | \t/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:986\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) proxy TaskCondition context Done\nWraps: (3) context deadline exceeded\nError types: (1) withstack.withStack (2) *errutil.withPrefix (3) context.deadlineExceededError"] 2024-09-16 15:39:24 [2024/09/16 10:09:24.730 +00:00] [ACCESS] Query [status: Failed] [code: 10001] [sdk: nodejs-2.4.8] [msg: proxy TaskCondition context Done: context deadline exceeded] [traceID: 7ba650f7017f2e8ff9e4b9a8ab2f6496] [timeCost: 14.995998549s] [database: default] [collection: local_009_bot_knowledge_base_embeddings] [partitions: []] [expr: pk > -9223372036854775807]
....
2024-09-16 15:39:51 [2024/09/16 10:09:51.364 +00:00] [WARN] [segments/segment.go:645] ["Retrieve failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [partitionID=452586433906215404] [segmentID=452586433907415483] [msgID=452586433906215486] [segmentType=Growing]
2024-09-16 15:39:51 [2024/09/16 10:09:51.366 +00:00] [WARN] [querynodev2/services.go:865] ["failed to query channel"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [msgID=452586433906215486] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [scope=Streaming] [error="Future was cancelled"] [errorVerbose="Future was cancelled\n(1) forced error mark\n | \"context deadline exceeded\"\n | context/context.deadlineExceededError::\nWraps: (2) Future was cancelled\nError types: (1) *markers.withMark (2) merr.milvusError"]
2024-09-16 15:39:51 [2024/09/16 10:09:51.368 +00:00] [WARN] [delegator/delegator.go:662] ["failed to execute sub task"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [replicaID=452586434226290689] [taskType=Query] [nodeID=1] [error="worker(1) query failed: Future was cancelled"]
2024-09-16 15:39:51 [2024/09/16 10:09:51.369 +00:00] [WARN] [delegator/delegator.go:682] ["Delegator execute subTask failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [replicaID=452586434226290689] [taskType=Query] [error="worker(1) query failed: Future was cancelled"]
2024-09-16 15:39:51 [2024/09/16 10:09:51.369 +00:00] [WARN] [delegator/delegator.go:527] ["Delegator query failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [replicaID=452586434226290689] [error="worker(1) query failed: Future was cancelled"]
2024-09-16 15:39:51 [2024/09/16 10:09:51.369 +00:00] [WARN] [querynodev2/handlers.go:221] ["failed to query on delegator"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [msgID=452586433906215486] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [scope=All] [error="worker(1) query failed: Future was cancelled"]
2024-09-16 15:39:51 [2024/09/16 10:09:51.370 +00:00] [WARN] [proxy/task_query.go:565] ["QueryNode query result error"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collection=452586433906215403] [partitionIDs="[]"] [nodeID=1] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [errorCode=UnexpectedError] [reason="worker(1) query failed: Future was cancelled"]
2024-09-16 15:39:51 [2024/09/16 10:09:51.371 +00:00] [WARN] [proxy/lb_policy.go:182] ["search/query channel failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [collectionName=local_009_bot_knowledge_base_embeddings] [channelName=by-dev-rootcoord-dml_0_452586433906215403v0] [nodeID=1] [error="fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled"] [errorVerbose="fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).queryShard\n | \t/workspace/source/internal/proxy/task_query.go:566\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/workspace/source/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry\n | \t/workspace/source/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).Execute.func2\n | \t/workspace/source/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) fail to Query on QueryNode 1\nWraps: (3) worker(1) query failed: Future was cancelled\nError types: (1) withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]
2024-09-16 15:39:51 [2024/09/16 10:09:51.373 +00:00] [WARN] [retry/retry.go:46] ["retry func failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [retried=0] [error="failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled"] [errorVerbose="failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (2) failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).queryShard\n | \t/workspace/source/internal/proxy/task_query.go:566\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/workspace/source/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry\n | \t/workspace/source/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).Execute.func2\n | \t/workspace/source/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (4) fail to Query on QueryNode 1\nWraps: (5) worker(1) query failed: Future was cancelled\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) withstack.withStack (4) errutil.withPrefix (5) merr.milvusError"]
2024-09-16 15:39:51 [2024/09/16 10:09:51.377 +00:00] [WARN] [proxy/task_query.go:470] ["fail to execute query"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collection=452586433906215403] [partitionIDs="[]"] [requestType=query] [error="failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled"] [errorVerbose="failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (2) failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).queryShard\n | \t/workspace/source/internal/proxy/task_query.go:566\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/workspace/source/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry\n | \t/workspace/source/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).Execute.func2\n | \t/workspace/source/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (4) fail to Query on QueryNode 1\nWraps: (5) worker(1) query failed: Future was cancelled\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) withstack.withStack (4) errutil.withPrefix (5) merr.milvusError"]
2024-09-16 15:39:51 [2024/09/16 10:09:51.379 +00:00] [WARN] [proxy/task_scheduler.go:477] ["Failed to execute task: "] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [error="failed to query: failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled"] [errorVerbose="failed to query: failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).Execute\n | \t/workspace/source/internal/proxy/task_query.go:471\n | github.com/milvus-io/milvus/internal/proxy.(taskScheduler).processTask\n | \t/workspace/source/internal/proxy/task_scheduler.go:474\n | github.com/milvus-io/milvus/internal/proxy.(taskScheduler).queryLoop.func1\n | \t/workspace/source/internal/proxy/task_scheduler.go:553\n | github.com/milvus-io/milvus/pkg/util/conc.(Pool[...]).Submit.func1\n | \t/workspace/source/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:67\nWraps: (2) failed to query\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).queryShard\n | \t/workspace/source/internal/proxy/task_query.go:566\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/workspace/source/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry\n | \t/workspace/source/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).Execute.func2\n | \t/workspace/source/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (6) fail to Query on QueryNode 1\nWraps: (7) worker(1) query failed: Future was cancelled\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) withstack.withStack (4) errutil.withPrefix (5) withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]