milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.57k stars 2.83k forks source link

[Bug]: Intermittent Query Failures with context deadline exceeded errors in Milvus Despite Default Configuration #36311

Open sabrahul opened 1 week ago

sabrahul commented 1 week ago

Is there an existing issue for this?

Summary

Intermittent Query Failure in Milvus

Hi, I am new to Milvus and vector databases, and I've encountered a recurring issue that I haven't been able to resolve despite multiple attempts. I'm hoping someone can provide guidance on the correct approach or potential workarounds.

I've been troubleshooting this for the past two days. The Milvus container starts up correctly, along with ETCD, MinIO, and Attu. However, when I try to run a query, I frequently encounter the error shown in the logs above. Surprisingly, the same query sometimes works as expected, but other times it fails with no clear indication of what’s causing the inconsistency.

I’ve used the default configurations in my milvus.yaml file and have tried modifying various settings, but none of the changes have resolved the issue. I would appreciate any insights on what configurations I should focus on or if there are any known workarounds for this problem.

Thank you in advance for your support.

Environment

- Milvus version: 2.4.11
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.6
- OS(Ubuntu or CentOS):
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

When I am trying to query the db or collection, I am getting context deadline exceeded errors continuously. But sometimes it did works that I don't know how.

Expected Behavior

When I query it should return results properly.

Steps To Reproduce

1. Set up standalone milvus with this configuration

  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    platform: linux/amd64
    environment:
      - ALLOW_NONE_AUTHENTICATION=yes
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - milvus_etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3
    <<: *network_config

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    platform: linux/amd64
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - milvus_minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3
    <<: *network_config

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.11
    platform: linux/amd64
    command: ["milvus", "run", "standalone"]
    security_opt:
    - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - milvus_data:/var/lib/milvus
      - /milvus.yaml:/milvus/configs/milvus.yaml
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    <<: *network_config

  attu:
    container_name: attu
    image: zilliz/attu:v2.4.8
    platform: linux/amd64
    environment:
      MILVUS_URL: milvus-standalone:19530
    ports:
      - "8093:3000"
    depends_on:
      - "standalone"
    <<: *network_config

networks:
  docker-local:
    name: docker-local
    driver: bridge

2. Go to attu local container at 8093 port.
3. Go to your collection and try to query.

Milvus Log

2024-09-16 15:39:24 [2024/09/16 10:09:24.729 +00:00] [WARN] [proxy/impl.go:3478] ["Query failed to WaitToFinish"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [role=proxy] [db=default]

[collection=local_009_bot_knowledge_base_embeddings] [partitions="[]"] [ConsistencyLevel=Bounded] [useDefaultConsistency=false] [error="proxy TaskCondition context Done: context deadline exceeded"] [errorVerbose="proxy TaskCondition context Done: context deadline exceeded\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(TaskCondition).WaitToFinish\n | \t/workspace/source/internal/proxy/condition.go:45\n | github.com/milvus-io/milvus/internal/proxy.(Proxy).query\n | \t/workspace/source/internal/proxy/impl.go:3477\n | github.com/milvus-io/milvus/internal/proxy.(Proxy).Query\n | \t/workspace/source/internal/proxy/impl.go:3568\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).Query\n | \t/workspace/source/internal/distributed/proxy/service.go:954\n | github.com/milvus-io/milvus-proto/go-api/v2/milvuspb._MilvusService_Query_Handler.func1\n | \t/go/pkg/mod/github.com/milvus-io/milvus-proto/go-api/v2@v2.4.11/milvuspb/milvus_grpc.pb.go:2063\n | github.com/milvus-io/milvus/internal/proxy/connection.KeepActiveInterceptor\n | \t/workspace/source/internal/proxy/connection/util.go:60\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/proxy.TraceLogInterceptor\n | \t/workspace/source/internal/proxy/trace_log_interceptor.go:36\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/proxy/accesslog.UnaryUpdateAccessInfoInterceptor\n | \t/workspace/source/internal/proxy/accesslog/util.go:47\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.RateLimitInterceptor.func5\n | \t/workspace/source/internal/proxy/rate_limit_interceptor.go:56\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/pkg/util/logutil.UnaryTraceLoggerInterceptor\n | \t/workspace/source/pkg/util/logutil/grpc_interceptor.go:23\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.UnaryServerInterceptor.func4\n | \t/workspace/source/internal/proxy/privilege_interceptor.go:112\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/proxy.HookInterceptor\n | \t/workspace/source/internal/proxy/hook_interceptor.go:49\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.UnaryServerHookInterceptor.func3\n | \t/workspace/source/internal/proxy/hook_interceptor.go:19\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.DatabaseInterceptor.func2\n | \t/workspace/source/internal/proxy/database_interceptor.go:15\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.UnaryServerInterceptor.func1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/auth/auth.go:47\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/proxy/accesslog.UnaryAccessLogInterceptor\n | \t/workspace/source/internal/proxy/accesslog/util.go:38\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6.1.1\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25\n | github.com/milvus-io/milvus/internal/distributed/proxy.(Server).startExternalGrpc.ChainUnaryServer.func6\n | \t/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34\n | github.com/milvus-io/milvus-proto/go-api/v2/milvuspb._MilvusService_Query_Handler\n | \t/go/pkg/mod/github.com/milvus-io/milvus-proto/go-api/v2@v2.4.11/milvuspb/milvus_grpc.pb.go:2065\n | google.golang.org/grpc.(Server).processUnaryRPC\n | \t/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1343\n | google.golang.org/grpc.(Server).handleStream\n | \t/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:1737\n | google.golang.org/grpc.(Server).serveStreams.func1.1\n | \t/go/pkg/mod/google.golang.org/grpc@v1.59.0/server.go:986\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) proxy TaskCondition context Done\nWraps: (3) context deadline exceeded\nError types: (1) withstack.withStack (2) *errutil.withPrefix (3) context.deadlineExceededError"] 2024-09-16 15:39:24 [2024/09/16 10:09:24.730 +00:00] [ACCESS] Query [status: Failed] [code: 10001] [sdk: nodejs-2.4.8] [msg: proxy TaskCondition context Done: context deadline exceeded] [traceID: 7ba650f7017f2e8ff9e4b9a8ab2f6496] [timeCost: 14.995998549s] [database: default] [collection: local_009_bot_knowledge_base_embeddings] [partitions: []] [expr: pk > -9223372036854775807]

....

2024-09-16 15:39:51 [2024/09/16 10:09:51.364 +00:00] [WARN] [segments/segment.go:645] ["Retrieve failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [partitionID=452586433906215404] [segmentID=452586433907415483] [msgID=452586433906215486] [segmentType=Growing]

2024-09-16 15:39:51 [2024/09/16 10:09:51.366 +00:00] [WARN] [querynodev2/services.go:865] ["failed to query channel"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [msgID=452586433906215486] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [scope=Streaming] [error="Future was cancelled"] [errorVerbose="Future was cancelled\n(1) forced error mark\n | \"context deadline exceeded\"\n | context/context.deadlineExceededError::\nWraps: (2) Future was cancelled\nError types: (1) *markers.withMark (2) merr.milvusError"]

2024-09-16 15:39:51 [2024/09/16 10:09:51.368 +00:00] [WARN] [delegator/delegator.go:662] ["failed to execute sub task"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [replicaID=452586434226290689] [taskType=Query] [nodeID=1] [error="worker(1) query failed: Future was cancelled"]

2024-09-16 15:39:51 [2024/09/16 10:09:51.369 +00:00] [WARN] [delegator/delegator.go:682] ["Delegator execute subTask failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [replicaID=452586434226290689] [taskType=Query] [error="worker(1) query failed: Future was cancelled"]

2024-09-16 15:39:51 [2024/09/16 10:09:51.369 +00:00] [WARN] [delegator/delegator.go:527] ["Delegator query failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [replicaID=452586434226290689] [error="worker(1) query failed: Future was cancelled"]

2024-09-16 15:39:51 [2024/09/16 10:09:51.369 +00:00] [WARN] [querynodev2/handlers.go:221] ["failed to query on delegator"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [msgID=452586433906215486] [collectionID=452586433906215403] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [scope=All] [error="worker(1) query failed: Future was cancelled"]

2024-09-16 15:39:51 [2024/09/16 10:09:51.370 +00:00] [WARN] [proxy/task_query.go:565] ["QueryNode query result error"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collection=452586433906215403] [partitionIDs="[]"] [nodeID=1] [channel=by-dev-rootcoord-dml_0_452586433906215403v0] [errorCode=UnexpectedError] [reason="worker(1) query failed: Future was cancelled"]

2024-09-16 15:39:51 [2024/09/16 10:09:51.371 +00:00] [WARN] [proxy/lb_policy.go:182] ["search/query channel failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collectionID=452586433906215403] [collectionName=local_009_bot_knowledge_base_embeddings] [channelName=by-dev-rootcoord-dml_0_452586433906215403v0] [nodeID=1] [error="fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled"] [errorVerbose="fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).queryShard\n | \t/workspace/source/internal/proxy/task_query.go:566\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/workspace/source/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry\n | \t/workspace/source/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).Execute.func2\n | \t/workspace/source/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (2) fail to Query on QueryNode 1\nWraps: (3) worker(1) query failed: Future was cancelled\nError types: (1) withstack.withStack (2) *errutil.withPrefix (3) merr.milvusError"]

2024-09-16 15:39:51 [2024/09/16 10:09:51.373 +00:00] [WARN] [retry/retry.go:46] ["retry func failed"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [retried=0] [error="failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled"] [errorVerbose="failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (2) failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).queryShard\n | \t/workspace/source/internal/proxy/task_query.go:566\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/workspace/source/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry\n | \t/workspace/source/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).Execute.func2\n | \t/workspace/source/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (4) fail to Query on QueryNode 1\nWraps: (5) worker(1) query failed: Future was cancelled\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) withstack.withStack (4) errutil.withPrefix (5) merr.milvusError"]

2024-09-16 15:39:51 [2024/09/16 10:09:51.377 +00:00] [WARN] [proxy/task_query.go:470] ["fail to execute query"] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [collection=452586433906215403] [partitionIDs="[]"] [requestType=query] [error="failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled"] [errorVerbose="failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (2) failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).queryShard\n | \t/workspace/source/internal/proxy/task_query.go:566\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/workspace/source/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry\n | \t/workspace/source/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).Execute.func2\n | \t/workspace/source/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (4) fail to Query on QueryNode 1\nWraps: (5) worker(1) query failed: Future was cancelled\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) withstack.withStack (4) errutil.withPrefix (5) merr.milvusError"]

2024-09-16 15:39:51 [2024/09/16 10:09:51.379 +00:00] [WARN] [proxy/task_scheduler.go:477] ["Failed to execute task: "] [traceID=7ba650f7017f2e8ff9e4b9a8ab2f6496] [error="failed to query: failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled"] [errorVerbose="failed to query: failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0: fail to Query on QueryNode 1: worker(1) query failed: Future was cancelled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).Execute\n | \t/workspace/source/internal/proxy/task_query.go:471\n | github.com/milvus-io/milvus/internal/proxy.(taskScheduler).processTask\n | \t/workspace/source/internal/proxy/task_scheduler.go:474\n | github.com/milvus-io/milvus/internal/proxy.(taskScheduler).queryLoop.func1\n | \t/workspace/source/internal/proxy/task_scheduler.go:553\n | github.com/milvus-io/milvus/pkg/util/conc.(Pool[...]).Submit.func1\n | \t/workspace/source/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/v2@v2.7.2/worker.go:67\nWraps: (2) failed to query\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 1 for channel by-dev-rootcoord-dml_0_452586433906215403v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(queryTask).queryShard\n | \t/workspace/source/internal/proxy/task_query.go:566\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n | \t/workspace/source/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/workspace/source/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry\n | \t/workspace/source/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).Execute.func2\n | \t/workspace/source/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1650\nWraps: (6) fail to Query on QueryNode 1\nWraps: (7) worker(1) query failed: Future was cancelled\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) withstack.withStack (4) errutil.withPrefix (5) withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]

yanliang567 commented 6 days ago

@sabrahul it sounds like a timeout issue. Could you please refer this doc to export the whole Milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

Also please attach the etcd backup for investigation? Check this: https://github.com/milvus-io/birdwatcher for details about how to backup etcd with birdwatcher

/assign @sabrahul /unassign