Closed tanvlt closed 3 weeks ago
it seems you have too many channels. Some of the load task timeout.
probably caused by this bug https://github.com/milvus-io/milvus/issues/35008 Could you upgrade to 2.4.9 and retry?
Hi @xiaofan-luan , i have upgraded to 2.4.9 and tried to start again but did not help, it still can not startup again There are a lot of logs like bellow
[2024/08/23 03:57:16.150 +00:00] [INFO] [balance/utils.go:115] ["create channel task"] [collection=451222158446365459] [replica=451337904531177490] [channel=by-dev-rootcoord-dml_8_451222158446365459v0] [from=-1] [to=54]
[2024/08/23 03:57:16.150 +00:00] [INFO] [balance/utils.go:115] ["create channel task"] [collection=451021127533543049] [replica=451222161215193148] [channel=by-dev-rootcoord-dml_0_451021127533543049v0] [from=-1] [to=54]
[2024/08/23 03:57:16.150 +00:00] [INFO] [balance/utils.go:115] ["create channel task"] [collection=451639263225372070] [replica=451639267229434001] [channel=by-dev-rootcoord-dml_12_451639263225372070v0] [from=-1] [to=54]
[2024/08/23 03:57:16.150 +00:00] [INFO] [balance/utils.go:115] ["create channel task"] [collection=451708688926640217] [replica=451708692028194863] [channel=by-dev-rootcoord-dml_4_451708688926640217v0] [from=-1] [to=54]
[2024/08/23 03:57:16.150 +00:00] [INFO] [balance/utils.go:115] ["create channel task"] [collection=451021127535038140] [replica=451310628132618244] [channel=by-dev-rootcoord-dml_12_451021127535038140v0] [from=-1] [to=54]
[2024/08/23 03:57:16.150 +00:00] [INFO] [balance/utils.go:115] ["create channel task"] [collection=451708688918054141] [replica=451708692028194847] [channel=by-dev-rootcoord-dml_0_451708688918054141v0] [from=-1] [to=54]
[2024/08/23 03:57:16.150 +00:00] [INFO] [balance/utils.go:115] ["create channel task"] [collection=451639263220524381] [replica=451639267229433866] [channel=by-dev-rootcoord-dml_7_451639263220524381v0] [from=-1] [to=54]
[2024/08/23 03:57:16.150 +00:00] [INFO] [balance/utils.go:115] ["create channel task"] [collection=451639263225110322] [replica=451639267229433976] [channel=by-dev-rootcoord-dml_13_451639263225110322v0] [from=-1] [to=54]
[2024/08/23 03:57:16.186 +00:00] [WARN] [rootcoord/quota_center.go:315] ["quotaCenter collect metrics failed"] [error="collection not found[collection=451437057771314021]"]
[2024/08/23 03:57:16.229 +00:00] [INFO] [task/scheduler.go:643] ["processed tasks"] [nodeID=54] [toProcessNum=289] [committedNum=0] [toRemoveNum=0]
[2024/08/23 03:57:16.229 +00:00] [INFO] [task/scheduler.go:649] ["process tasks related to node done"] [nodeID=54] [processingTaskNum=289] [waitingTaskNum=0] [segmentTaskNum=0] [channelTaskNum=289]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
[2024/08/23 03:57:16.262 +00:00] [INFO] [msgstream/mq_msgstream.go:939] ["skip msg"] [source=27] [type=TimeTick] [size=17] [position=<nil>]
I think this is just becasue there are many channels. Need to wait until all the channels to load might take some time.
check "Stop timer for ToWatch operation succeeded" see is there are some new succeed channels
/assign @congqixia /unassign
hi @xiaofan-luan unfortunately i had been waited for it in a long time but still did not start completely
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
After restarting suddenly the collections can not be loaded
Expected Behavior
/var/lib/milvus/rdb_data/LOCK
or/var/lib/milvus/rdb_data_meta_kv/LOCK
(they are empty) but did not helpSteps To Reproduce
Milvus Log
milvus-s3-pilot-etcd-0.log pod_log.zip
Anything else?
No response