milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.56k stars 2.83k forks source link

rpc error: code = Canceled desc = context canceled #36111

Open fengchen8556203 opened 2 weeks ago

fengchen8556203 commented 2 weeks ago

Is there an existing issue for this?

Environment

- Milvus version:2.3.11
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):  pulsar  
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

![Uploading 2ad3d7f5195930d075de5d74d23c14c.png…]()

Expected Behavior

当时执行获取集合数据时,无法获取,可以通过milvus-backup备份数据删除之前的集合重新导入后就又好了

Steps To Reproduce

No response

Milvus Log

[WARN] [proxy/lb_policy.go:182] ["search/query channel failed"] [traceID=f3d6df32843ed074da04aa997021ce75] [collectionID=450390808328228034] [collectionName=chatlibrary_corpus] [channelName=by-dev-rootcoord-dml_12_450390808328228034v2] [nodeID=2292] [error="stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:555 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:569 github.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:88 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:213 github.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).Query\n/go/src/github.com/milvus-io/milvus/internal/proxy/task_query.go:504 github.com/milvus-io/milvus/internal/proxy.(queryTask).queryShard\n/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180 github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry.func1\n/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44 github.com/milvus-io/milvus/pkg/util/retry.Do\n/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154 github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).ExecuteWithRetry\n/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:213 github.com/milvus-io/milvus/internal/proxy.(LBPolicyImpl).Execute.func2: attempt #0: rpc error: code = Canceled desc = context canceled: context canceled"] [errorVerbose="stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace: attempt #0: rpc error: code = Canceled desc = context canceled: c [ERROR] [retry/retry.go:46] ["retry func failed"] ["retry time"=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:46\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).call\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:467\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).Call\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:553\ngithub.com/milvus-io/milvus/internal/util/grpcclient.(ClientBase[...]).ReCall\n\t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:569\ngithub.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:88\ngithub.com/milvus-io/milvus/internal/distributed/querynode/client.(Client).GetComponentStates\n\t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:102\ngithub.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetComponentStates.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:252\ngithub.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).send\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:271\ngithub.com/milvus-io/milvus/internal/querycoordv2/session.(QueryCluster).GetComponentStates\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:251\ngithub.com/milvus-io/milvus/internal/querycoordv2.(Server).checkNodeHealth.func1\n\t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/services.go:998\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/go/pkg/mod/golang.org/x/

Anything else?

No response

xiaofan-luan commented 1 week ago

@fengchen8556203 can we upload log for further investigation

yanliang567 commented 1 week ago

/assign @fengchen8556203 /unassign

fengchen8556203 commented 1 week ago

这是我的milvus集群所有的日志,存放在夸克网盘中:https://pan.quark.cn/s/438a9b79cd23#/list/share

yanliang567 commented 1 week ago

i can not downlowd the logs, please share the logs in google or baidu driver?

fengchen8556203 commented 15 hours ago

我将日志信息精确到2小时内的日志 14点20分左右操作了集合chatlibrary_corpus_bak_new_lod获取数据失败 14点20左右获取law_search异常提示: failed to search/query delegator 2289 for channel by-dev-rootcoord-dml_0_448394611967952126v2: fail to Search, QueryNode ID=2289, reason=Timestamp lag too large milvus-log.tar.gz

xiaofan-luan commented 14 hours ago

For some reason, querynode can not consume data from kafka/pulsar

[2024/09/23 06:44:36.624 +00:00] [WARN] [pipeline/pipeline.go:51] ["some node(s) haven't received input"] [list="[nodeCtxTtChecker-DeleteNode-by-dev-rootcoord-dml_1_450390808353746312v1,nodeCtxTtChecker-FilterNode-by-dev-rootcoord-dml_1_450390808353746312v1,nodeCtxTtChecker-InsertNode-by-dev-rootcoord-dml_1_450390808353746312v1]"] ["duration "=2m0s]

It seems that under you environment, kafka/pulsar can not consumed for a while. Maybe you need to check the states of your pulsar/kafka cluster(Especially see their disk usage and cpu usage) see there are any issue on your message queue.

xiaofan-luan commented 14 hours ago

and we only see error log on this milvusdata-querynode-75848b9568-wwm2z.log. you can retry to reboot this node see if it works.