milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.1k stars 2.96k forks source link

[Bug]: #36630

Open linqingxu opened 2 months ago

linqingxu commented 2 months ago

Is there an existing issue for this?

Environment

- Milvus version:v2.3.3
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

milvus的collection加载一直0%,无法新增和删除数据

Expected Behavior

正常加载collection里的数据

Steps To Reproduce

发生在milvus数据盘满后

Milvus Log

["failed to promote task"] [taskID=1727684681753] [error="failed to get shard delegator: channel=by-dev-rootcoord-dml_15_452467457634638537v0: channel not found"] [errorVerbose="failed to get shard delegator: channel=by-dev-rootcoord-dml_15_452467457634638537v0: channel not found\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrChannelNotFound\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:490\n | [...repeated from below...]\nWraps: (2) failed to get shard delegator\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.wrapWithField\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:760\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrChannelNotFound\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:488\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(taskScheduler).checkSegmentTaskStale\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/scheduler.go:802\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(taskScheduler).checkStale\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/scheduler.go:741\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(taskScheduler).check\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/scheduler.go:664\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(taskScheduler).promote\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/scheduler.go:427\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(taskScheduler).tryPromoteAll.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/scheduler.go:390\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(taskQueue).Range\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/scheduler.go:121\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(taskScheduler).tryPromoteAll\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/scheduler.go:389\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(taskScheduler).schedule\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/scheduler.go:529\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(taskScheduler).Dispatch\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/scheduler.go:445\n | github.com/milvus-io/milvus/internal/querycoordv2/dist.(distHandler).handleDistResp\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/dist/dist_handler.go:110\n | github.com/milvus-io/milvus/internal/querycoordv2/dist.(distHandler).start\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/dist/dist_handler.go:86\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (4) channel=by-dev-rootcoord-dml_15_452467457634638537v0\nWraps: (5) channel not found\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) withstack.withStack (4) *errutil.withPrefix (5) merr.milvusError"]

Anything else?

None

xiaofan-luan commented 2 months ago

@linqingxu we need more logs to investigate this issue. And we recommend to upgrade to 2.3.2x to better statbility

yanliang567 commented 1 month ago

@linqingxu Please refer this doc to export the whole Milvus logs for investigation. For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

/assign @linqingxu /unassign

stale[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.