milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.08k stars 2.88k forks source link

[Bug]: Error: 4 DEADLINE_EXCEEDED: Deadline exceeded #29560

Closed parth-patel2023 closed 9 months ago

parth-patel2023 commented 9 months ago

Is there an existing issue for this?

Environment

- Milvus version: v2.3.0
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):2.3.1
- OS(Ubuntu or CentOS): Ubunti
- CPU/Memory: 16GB RAM/256GB Storage
- GPU: 
- Others:

Current Behavior

The existing collection not loading and throw error

Screenshot from 2023-12-28 12-21-11

Expected Behavior

All collection should be loaded

Milvus Log

[2023/12/28 08:02:36.869 +00:00] [WARN] [grpcclient/client.go:365] ["call received grpc error"] [clientRole=querycoord] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"] [2023/12/28 08:02:36.869 +00:00] [WARN] [grpcclient/client.go:365] ["call received grpc error"] [traceID=c414dc5b95a9a72829127d0e00de62cb] [clientRole=querynode-81] [error="rpc error: code = Canceled desc = context canceled"] [2023/12/28 08:02:36.869 +00:00] [ERROR] [retry/retry.go:42] ["retry func failed"] [traceID=c414dc5b95a9a72829127d0e00de62cb] ["retry time"=0] [error="rpc error: code = Canceled desc = context canceled"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:42\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).call\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:405\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:483\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499\ngithub.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:90\ngithub.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:265\ngithub.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetMetrics.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:229\ngithub.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).send\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:278\ngithub.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:228\ngithub.com/milvus-io/milvus/internal/querycoordv2.(Server).tryGetNodesMetrics.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/handlers.go:289"] [2023/12/28 08:02:36.870 +00:00] [ERROR] [retry/retry.go:42] ["retry func failed"] ["retry time"=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:42\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).call\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:405\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:483\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499\ngithub.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:109\ngithub.com/milvus-io/milvus/internal/distributed/querycoord/client.(Client).GetMetrics\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:281\ngithub.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).syncMetrics.func1\n\t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:191\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75"] [2023/12/28 08:02:36.869 +00:00] [WARN] [grpcclient/client.go:486] ["ClientBase Call grpc call get error"] [traceID=c414dc5b95a9a72829127d0e00de62cb] [role=querynode-81] [address=172.18.0.4:21123] [error="stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:485 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:90 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:265 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:229 github.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetMetrics.func1\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:278 github.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).send\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:228 github.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/handlers.go:289 github.com/milvus-io/milvus/internal/querycoordv2.(Server).tryGetNodesMetrics.func1\n/usr/local/go/src/runtime/asm_amd64.s:1598 runtime.goexit: attempt #0: rpc error: code = Canceled desc = context canceled: context done during sleep after run#0: context canceled"] [errorVerbose="stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace: attempt #0: rpc error: code = Canceled desc = context canceled: context done during sleep after run#0: context canceled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n | \t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:485\n | [...repeated from below...]\nWraps: (2) stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n | /go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:485 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n | /go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n | /go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:90 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]\n | /go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:265 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).GetMetrics\n | /go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:229 github.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetMetrics.func1\n | /go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:278 github.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).send\n | /go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:228 github.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetMetrics\n | /go/src/github.com/milvus-io/milvus/internal/querycoordv2/handlers.go:289 github.com/milvus-io/milvus/internal/querycoordv2.(Server).tryGetNodesMetrics.func1\n | /usr/local/go/src/runtime/asm_amd64.s:1598 runtime.goexit\nWraps: (3) attempt #0: rpc error: code = Canceled desc = context canceled\nWraps: (4) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:55\n | github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).call\n | \t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:405\n | github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n | \t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:483\n | github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n | \t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499\n | github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]\n | \t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:90\n | github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).GetMetrics\n | \t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:265\n | github.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetMetrics.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:229\n | github.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).send\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:278\n | github.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetMetrics\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:228\n | github.com/milvus-io/milvus/internal/querycoordv2.(Server).tryGetNodesMetrics.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/handlers.go:289\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (5) context done during sleep after run#0\nWraps: (6) context canceled\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) merr.multiErrors (4) withstack.withStack (5) errutil.withPrefix (6) errors.errorString"] [2023/12/28 08:02:36.870 +00:00] [WARN] [querycoordv2/handlers.go:291] ["failed to get metric from QueryNode"] [nodeID=81] [2023/12/28 08:02:36.870 +00:00] [WARN] [grpcclient/client.go:486] ["ClientBase Call grpc call get error"] [role=querycoord] [address=172.18.0.4:19531] [error="stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:485 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:109 github.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]\n/go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:281 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(Client).GetMetrics\n/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:191 github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).syncMetrics.func1\n/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 golang.org/x/sync/errgroup.(Group).Go.func1\n/usr/local/go/src/runtime/asm_amd64.s:1598 runtime.goexit: attempt #0: rpc error: code = DeadlineExceeded desc = context deadline exceeded: context done during sleep after run#0: context deadline exceeded"] [errorVerbose="stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace: attempt #0: rpc error: code = DeadlineExceeded desc = context deadline exceeded: context done during sleep after run#0: context deadline exceeded\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n | \t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:485\n | [...repeated from below...]\nWraps: (2) stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n | /go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:485 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n | /go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:499 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n | /go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:109 github.com/milvus-io/milvus/internal/distributed/querycoord/client.wrapGrpcCall[...]\n | /go/src/github.com/milvus-io/milvus/internal/distributed/querycoord/client/client.go:281 github.com/milvus-io/milvus/internal/distributed/querycoord/client.(Client).GetMetrics\n | /go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:191 github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).syncMetrics.func1\n | /go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 golang.org/x/sync/errgroup^

Anything else?

No response

yhmo commented 9 months ago

The log is too short. Search "[WARN]" and "[ERROR]" in the log then paste here.

parth-patel2023 commented 9 months ago

The log is too short. Search "[WARN]" and "[ERROR]" in the log then paste here.

Updated

yanliang567 commented 9 months ago

@parth-patel2023 Could you please attach the etcd backup for investigation? Check this: https://github.com/milvus-io/birdwatcher for details about how to backup etcd with birdwatcher /assign @parth-patel2023 /unassign

parth-patel2023 commented 9 months ago

@parth-patel2023 Could you please attach the etcd backup for investigation? Check this: https://github.com/milvus-io/birdwatcher for details about how to backup etcd with birdwatcher /assign @parth-patel2023 /unassign

Okay will update you

xiaofan-luan commented 9 months ago

seems like the load issue fixed in latest 2.3.4. Could you try to upgrade and see if it could fix?

parth-patel2023 commented 9 months ago

seems like the load issue fixed in latest 2.3.4. Could you try to upgrade and see if it could fix?

Yes, @xiaofan-luan it's load issue , it's working fine.