milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.27k stars 2.81k forks source link

[Bug]: [Bug]:failed to search/query delegator 617 for channel by-dev-rootcoord-dml_4_448959102009636230v0: fail to Search, QueryNode ID=617, reason=Timestamp lag too large #34826

Closed MonikaVijayakumar25 closed 4 days ago

MonikaVijayakumar25 commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version:2.3.x
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): k8s   
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Initially, the collection loaded successfully, but the search or query to that collection wasn't functioning. Subsequently, we restarted both the standalone service and Minio. Now, the collection fails to load altogether after the restart. However, newly created collections are operating without any issues.

How to fix this issue without any data loss.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

With enitites (while load the collection )

2024-07-19 10:44:32 [2024/07/19 05:14:32.867 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.867 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.777 +00:00] [INFO] [task/scheduler.go:269] ["task added"] [task="[id=1721311699109] [type=Grow] [source=segment_checker] [reason=lacks of segment] [collectionID=448959102008730178] [replicaID=451231532116082717] [priority=Normal] [actionsCount=1] [actions={[type=Grow][node=617][streaming=false]}] [segmentID=448959102008730202]"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.776 +00:00] [INFO] [balance/utils.go:70] ["create segment task"] [collection=448959102008730178] [segmentID=448959102008730202] [replica=451231532116082717] [channel=by-dev-rootcoord-dml_3_448959102008730178v0] [from=-1] [to=617] 2024-07-19 10:44:32 [2024/07/19 05:14:32.776 +00:00] [INFO] [observers/collection_observer.go:214] ["partition load progress"] [collectionID=448959102008730178] [partitionID=448959102008730179] [subChannelCount=1] [loadSegmentCount=0] 2024-07-19 10:44:32 [2024/07/19 05:14:32.765 +00:00] [INFO] [task/scheduler.go:572] ["process tasks related to node done"] [nodeID=617] [processingTaskNum=0] [waitingTaskNum=0] [segmentTaskNum=0] [channelTaskNum=0] 2024-07-19 10:44:32 [2024/07/19 05:14:32.765 +00:00] [INFO] [task/scheduler.go:566] ["processed tasks"] [nodeID=617] [toProcessNum=0] [committedNum=0] [toRemoveNum=1] 2024-07-19 10:44:32 [2024/07/19 05:14:32.765 +00:00] [INFO] [task/scheduler.go:768] ["task removed"] [taskID=1721311699108] [collectionID=448959102008730178] [replicaID=451231532116082717] [status=failed] [segmentID=448959102008730202] 2024-07-19 10:44:32 [2024/07/19 05:14:32.765 +00:00] [WARN] [meta/failed_load_cache.go:97] ["FailedLoadCache put failed record"] [collectionID=448959102008730178] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.765 +00:00] [WARN] [task/scheduler.go:727] ["task scheduler recordSegmentTaskError"] [taskID=1721311699108] [collectionID=448959102008730178] [replicaID=451231532116082717] [segmentID=448959102008730202] [status=failed] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.668 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.668 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.468 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.468 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.272 +00:00] [INFO] [task/executor.go:127] ["execute action done, remove it"] [taskID=1721311699108] [step=0] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.272 +00:00] [WARN] [task/executor.go:246] ["failed to load segment"] [taskID=1721311699108] [collectionID=448959102008730178] [replicaID=451231532116082717] [segmentID=448959102008730202] [node=617] [source=segment_checker] [shardLeader=617] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.271 +00:00] [WARN] [querynodev2/services.go:469] ["delegator failed to load segments"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [partitionID=448959102008730179] [shard=by-dev-rootcoord-dml_3_448959102008730178v0] [segmentID=448959102008730202] [currentNodeID=617] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.271 +00:00] [WARN] [delegator/delegator_data.go:363] ["failed to load bloom filter set for segment"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [channel=by-dev-rootcoord-dml_3_448959102008730178v0] [replicaID=451231532116082717] [workID=617] [segments="[448959102008730202]"] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.271 +00:00] [WARN] [segments/segment_loader.go:530] ["failed to load remote segment"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [segmentIDs="[448959102008730202]"] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.271 +00:00] [ERROR] [funcutil/parallel.go:88] [loadRemoteFunc] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] [idx=0] [stack="github.com/milvus-io/milvus/pkg/util/funcutil.ProcessFuncParallel.func3\n\t/go/src/github.com/milvus-io/milvus/pkg/util/funcutil/parallel.go:88"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.271 +00:00] [WARN] [segments/segment_loader.go:515] ["load remote segment bloom filter failed"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [segmentIDs="[448959102008730202]"] [partitionID=448959102008730179] [segmentID=448959102008730202] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.271 +00:00] [ERROR] [retry/retry.go:46] ["retry func failed"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] ["retry time"=0] [error="NoSuchKey(key=file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1)"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:46\ngithub.com/milvus-io/milvus/internal/storage.(RemoteChunkManager).Read\n\t/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:166\ngithub.com/milvus-io/milvus/internal/storage.(RemoteChunkManager).MultiRead\n\t/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:222\ngithub.com/milvus-io/milvus/internal/querynodev2/segments.(segmentLoader).loadBloomFilter\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/segment_loader.go:758\ngithub.com/milvus-io/milvus/internal/querynodev2/segments.(segmentLoader).LoadBloomFilterSet.func2\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/segment_loader.go:513\ngithub.com/milvus-io/milvus/pkg/util/funcutil.ProcessFuncParallel.func3\n\t/go/src/github.com/milvus-io/milvus/pkg/util/funcutil/parallel.go:86"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.270 +00:00] [INFO] [segments/segment_loader.go:511] ["loading bloom filter for remote..."] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [segmentIDs="[448959102008730202]"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.270 +00:00] [INFO] [segments/segment_loader.go:501] ["start loading remote..."] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [segmentIDs="[448959102008730202]"] [segmentNum=1] 2024-07-19 10:44:32 [2024/07/19 05:14:32.270 +00:00] [INFO] [querynodev2/services.go:435] ["received load segments request"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [partitionID=448959102008730179] [shard=by-dev-rootcoord-dml_3_448959102008730178v0] [segmentID=448959102008730202] [currentNodeID=617] [version=1721366072269446991] [needTransfer=true] [loadScope=Full] 2024-07-19 10:44:32 [2024/07/19 05:14:32.269 +00:00] [INFO] [task/executor.go:238] ["load segments..."] [taskID=1721311699108] [collectionID=448959102008730178] [replicaID=451231532116082717] [segmentID=448959102008730202] [node=617] [source=segment_checker] [shardLeader=617] 2024-07-19 10:44:32 [2024/07/19 05:14:32.269 +00:00] [INFO] [datacoord/index_service.go:820] ["List index success"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] 2024-07-19 10:44:32 [2024/07/19 05:14:32.268 +00:00] [WARN] [utils/types.go:85] ["delta position is quite stale"] [collectionID=448959102008730178] [segmentID=448959102008730202] [channel=by-dev-rootcoord-dml_3_448959102008730178v0] [posTs=451026844942073857] [posTime=2024/07/09 13:14:32.124 +00:00] [tsLag=232h0m0.144863963s] 2024-07-19 10:44:32 [2024/07/19 05:14:32.268 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.267 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.266 +00:00] [INFO] [task/executor.go:112] ["execute the action of task"] [taskID=1721311699108] [collectionID=448959102008730178] [replicaID=451231532116082717] [step=0] [source=segment_checker] 2024-07-19 10:44:32 [2024/07/19 05:14:32.266 +00:00] [INFO] [task/scheduler.go:572] ["process tasks related to node done"] [nodeID=617] [processingTaskNum=1] [waitingTaskNum=0] [segmentTaskNum=1] [channelTaskNum=0] 2024-07-19 10:44:32 [2024/07/19 05:14:32.266 +00:00] [INFO] [task/scheduler.go:566] ["processed tasks"] [nodeID=617] [toProcessNum=1] [committedNum=1] [toRemoveNum=0] 2024-07-19 10:44:32 [2024/07/19 05:14:32.067 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:32 [2024/07/19 05:14:32.067 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.868 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.868 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [datacoord/services.go:799] ["datacoord append channelInfo in GetRecoveryInfo"] [traceID=8788b70cc1e6f66b62b3f99e363a68b9] [collectionID=451231530756381339] [partitionIDs="[]"] [channel=by-dev-rootcoord-dml_1_451231530756381339v0] ["# of unflushed segments"=0] ["# of flushed segments"=0] ["# of dropped segments"=0] ["# of indexed segments"=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [datacoord/handler.go:315] ["channel seek position set from collection start position"] [channel=by-dev-rootcoord-dml_1_451231530756381339v0] [posTs=451233479385153539] [posTime=2024/07/18 16:11:59.969 +00:00] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [datacoord/handler.go:281] ["segment position not found, setting channel seek position to channel start position"] [channel=by-dev-rootcoord-dml_1_451231530756381339v0] [posTs=451233479385153539] [posTime=2024/07/18 16:11:59.969 +00:00] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [datacoord/handler.go:116] [GetQueryVChanPositions] [collectionID=451231530756381339] [channel=by-dev-rootcoord-dml_1_451231530756381339v0] [numOfSegments=0] ["indexed segment"=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [datacoord/services.go:784] ["get recovery info request received"] [traceID=8788b70cc1e6f66b62b3f99e363a68b9] [collectionID=451231530756381339] [partitionIDs="[]"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [datacoord/services.go:799] ["datacoord append channelInfo in GetRecoveryInfo"] [traceID=9846897946523daf32b6fb6b61c18b66] [collectionID=448959102009636230] [partitionIDs="[]"] [channel=by-dev-rootcoord-dml_4_448959102009636230v0] ["# of unflushed segments"=0] ["# of flushed segments"=0] ["# of dropped segments"=0] ["# of indexed segments"=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_4_448959102009636230v0] [posTs=451026849975238657] [posTime=2024/07/09 13:14:51.324 +00:00] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [datacoord/handler.go:116] [GetQueryVChanPositions] [collectionID=448959102009636230] [channel=by-dev-rootcoord-dml_4_448959102009636230v0] [numOfSegments=0] ["indexed segment"=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [datacoord/services.go:784] ["get recovery info request received"] [traceID=9846897946523daf32b6fb6b61c18b66] [collectionID=448959102009636230] [partitionIDs="[]"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [observers/target_observer.go:481] ["observer trigger update current target"] [collectionID=451231530756381339] 2024-07-19 10:44:31 [2024/07/19 05:14:31.780 +00:00] [INFO] [observers/target_observer.go:481] ["observer trigger update current target"] [collectionID=448959102009636230] 2024-07-19 10:44:31 [2024/07/19 05:14:31.779 +00:00] [INFO] [delegator/distribution.go:294] ["Update readable segment version"] [oldVersion=1721366051778727286] [newVersion=1721366061778824400] [growingSegmentNum=0] [sealedSegmentNum=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.779 +00:00] [INFO] [querynodev2/services.go:1385] ["sync action"] [traceID=eedec809e34b231c7354ba6c6e9f5e07] [collectionID=448959102009636230] [channel=by-dev-rootcoord-dml_4_448959102009636230v0] [currentNodeID=617] [Action=UpdateVersion] [TargetVersion=1721366061778824400] 2024-07-19 10:44:31 [2024/07/19 05:14:31.779 +00:00] [INFO] [delegator/distribution.go:294] ["Update readable segment version"] [oldVersion=1721366051778424267] [newVersion=1721366061778613853] [growingSegmentNum=0] [sealedSegmentNum=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.779 +00:00] [INFO] [querynodev2/services.go:1385] ["sync action"] [traceID=12d8bb26af1e6f9ce641148f2702f18e] [collectionID=451231530756381339] [channel=by-dev-rootcoord-dml_1_451231530756381339v0] [currentNodeID=617] [Action=UpdateVersion] [TargetVersion=1721366061778613853] 2024-07-19 10:44:31 [2024/07/19 05:14:31.779 +00:00] [INFO] [datacoord/index_service.go:820] ["List index success"] [traceID=c788afa121dc70c0693f7973a3f213ed] [collectionID=451231530756381339] 2024-07-19 10:44:31 [2024/07/19 05:14:31.779 +00:00] [INFO] [datacoord/index_service.go:820] ["List index success"] [traceID=9cd6a02b08d03d9297f9e5c7cbe1c6d7] [collectionID=448959102009636230] 2024-07-19 10:44:31 [2024/07/19 05:14:31.777 +00:00] [INFO] [balance/utils.go:70] ["create segment task"] [collection=448959102008730178] [segmentID=448959102008730202] [replica=451231532116082717] [channel=by-dev-rootcoord-dml_3_448959102008730178v0] [from=-1] [to=617] 2024-07-19 10:44:31 [2024/07/19 05:14:31.777 +00:00] [INFO] [task/scheduler.go:269] ["task added"] [task="[id=1721311699108] [type=Grow] [source=segment_checker] [reason=lacks of segment] [collectionID=448959102008730178] [replicaID=451231532116082717] [priority=Normal] [actionsCount=1] [actions={[type=Grow][node=617][streaming=false]}] [segmentID=448959102008730202]"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.777 +00:00] [INFO] [balance/utils.go:70] ["create segment task"] [collection=448959102008730178] [segmentID=448959102008730202] [replica=451231532116082717] [channel=by-dev-rootcoord-dml_3_448959102008730178v0] [from=-1] [to=617] 2024-07-19 10:44:31 [2024/07/19 05:14:31.777 +00:00] [INFO] [observers/collection_observer.go:214] ["partition load progress"] [collectionID=448959102008730178] [partitionID=448959102008730179] [subChannelCount=1] [loadSegmentCount=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.766 +00:00] [INFO] [task/scheduler.go:572] ["process tasks related to node done"] [nodeID=617] [processingTaskNum=0] [waitingTaskNum=0] [segmentTaskNum=0] [channelTaskNum=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.766 +00:00] [INFO] [task/scheduler.go:566] ["processed tasks"] [nodeID=617] [toProcessNum=0] [committedNum=0] [toRemoveNum=1] 2024-07-19 10:44:31 [2024/07/19 05:14:31.766 +00:00] [INFO] [task/scheduler.go:768] ["task removed"] [taskID=1721311699107] [collectionID=448959102008730178] [replicaID=451231532116082717] [status=failed] [segmentID=448959102008730202] 2024-07-19 10:44:31 [2024/07/19 05:14:31.766 +00:00] [WARN] [meta/failed_load_cache.go:97] ["FailedLoadCache put failed record"] [collectionID=448959102008730178] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.766 +00:00] [WARN] [task/scheduler.go:727] ["task scheduler recordSegmentTaskError"] [taskID=1721311699107] [collectionID=448959102008730178] [replicaID=451231532116082717] [segmentID=448959102008730202] [status=failed] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.668 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.668 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.468 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.468 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.271 +00:00] [INFO] [task/executor.go:127] ["execute action done, remove it"] [taskID=1721311699107] [step=0] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.271 +00:00] [WARN] [task/executor.go:246] ["failed to load segment"] [taskID=1721311699107] [collectionID=448959102008730178] [replicaID=451231532116082717] [segmentID=448959102008730202] [node=617] [source=segment_checker] [shardLeader=617] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.271 +00:00] [WARN] [querynodev2/services.go:469] ["delegator failed to load segments"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [partitionID=448959102008730179] [shard=by-dev-rootcoord-dml_3_448959102008730178v0] [segmentID=448959102008730202] [currentNodeID=617] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.271 +00:00] [WARN] [delegator/delegator_data.go:363] ["failed to load bloom filter set for segment"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [channel=by-dev-rootcoord-dml_3_448959102008730178v0] [replicaID=451231532116082717] [workID=617] [segments="[448959102008730202]"] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.271 +00:00] [WARN] [segments/segment_loader.go:530] ["failed to load remote segment"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [segmentIDs="[448959102008730202]"] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.271 +00:00] [ERROR] [funcutil/parallel.go:88] [loadRemoteFunc] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] [idx=0] [stack="github.com/milvus-io/milvus/pkg/util/funcutil.ProcessFuncParallel.func3\n\t/go/src/github.com/milvus-io/milvus/pkg/util/funcutil/parallel.go:88"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.271 +00:00] [WARN] [segments/segment_loader.go:515] ["load remote segment bloom filter failed"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [segmentIDs="[448959102008730202]"] [partitionID=448959102008730179] [segmentID=448959102008730202] [error="failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1: attempt #0: NoSuchKey"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.271 +00:00] [ERROR] [retry/retry.go:46] ["retry func failed"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] ["retry time"=0] [error="NoSuchKey(key=file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1)"] [stack="github.com/milvus-io/milvus/pkg/util/retry.Do\n\t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:46\ngithub.com/milvus-io/milvus/internal/storage.(RemoteChunkManager).Read\n\t/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:166\ngithub.com/milvus-io/milvus/internal/storage.(RemoteChunkManager).MultiRead\n\t/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:222\ngithub.com/milvus-io/milvus/internal/querynodev2/segments.(segmentLoader).loadBloomFilter\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/segment_loader.go:758\ngithub.com/milvus-io/milvus/internal/querynodev2/segments.(segmentLoader).LoadBloomFilterSet.func2\n\t/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/segment_loader.go:513\ngithub.com/milvus-io/milvus/pkg/util/funcutil.ProcessFuncParallel.func3\n\t/go/src/github.com/milvus-io/milvus/pkg/util/funcutil/parallel.go:86"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.269 +00:00] [INFO] [segments/segment_loader.go:511] ["loading bloom filter for remote..."] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [segmentIDs="[448959102008730202]"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.269 +00:00] [INFO] [segments/segment_loader.go:501] ["start loading remote..."] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [segmentIDs="[448959102008730202]"] [segmentNum=1] 2024-07-19 10:44:31 [2024/07/19 05:14:31.269 +00:00] [INFO] [querynodev2/services.go:435] ["received load segments request"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] [partitionID=448959102008730179] [shard=by-dev-rootcoord-dml_3_448959102008730178v0] [segmentID=448959102008730202] [currentNodeID=617] [version=1721366071269020876] [needTransfer=true] [loadScope=Full] 2024-07-19 10:44:31 [2024/07/19 05:14:31.269 +00:00] [INFO] [task/executor.go:238] ["load segments..."] [taskID=1721311699107] [collectionID=448959102008730178] [replicaID=451231532116082717] [segmentID=448959102008730202] [node=617] [source=segment_checker] [shardLeader=617] 2024-07-19 10:44:31 [2024/07/19 05:14:31.268 +00:00] [INFO] [datacoord/index_service.go:820] ["List index success"] [traceID=d59ae8dea84495d20ee2f40df530f1d2] [collectionID=448959102008730178] 2024-07-19 10:44:31 [2024/07/19 05:14:31.268 +00:00] [WARN] [utils/types.go:85] ["delta position is quite stale"] [collectionID=448959102008730178] [segmentID=448959102008730202] [channel=by-dev-rootcoord-dml_3_448959102008730178v0] [posTs=451026844942073857] [posTime=2024/07/09 13:14:32.124 +00:00] [tsLag=231h59m59.144378683s] 2024-07-19 10:44:31 [2024/07/19 05:14:31.267 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.267 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.266 +00:00] [INFO] [task/scheduler.go:572] ["process tasks related to node done"] [nodeID=617] [processingTaskNum=1] [waitingTaskNum=0] [segmentTaskNum=1] [channelTaskNum=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.266 +00:00] [INFO] [task/executor.go:112] ["execute the action of task"] [taskID=1721311699107] [collectionID=448959102008730178] [replicaID=451231532116082717] [step=0] [source=segment_checker] 2024-07-19 10:44:31 [2024/07/19 05:14:31.266 +00:00] [INFO] [task/scheduler.go:566] ["processed tasks"] [nodeID=617] [toProcessNum=1] [committedNum=1] [toRemoveNum=0] 2024-07-19 10:44:31 [2024/07/19 05:14:31.068 +00:00] [WARN] [proxy/proxy.go:368] [sendChannelsTimeTickLoop.UpdateChannelTimeTick] [ErrorCode=UnexpectedError] [Reason="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:31 [2024/07/19 05:14:31.067 +00:00] [WARN] [rootcoord/root_coord.go:1598] ["failed to updateTimeTick"] [role=rootcoord] [error="skip ChannelTimeTickMsg from un-recognized session 617"] 2024-07-19 10:44:30 [2024/07/19 05:14:30.980 +00:00] [WARN] [rootcoord/proxy_client_manager.go:302] ["proxy client is empty, SetRates will not send to any client"]

Anything else?

No response

yanliang567 commented 1 month ago

@MonikaVijayakumar25 I guess the milvus has some downtime before the issue occurs, so Milvus fails to search for the tt lag is too huge. Please try to upgrade to latest milvus 2.3.19 or 2.4.6, which has improve the efficiency to catch up the tt lag. /assign @MonikaVijayakumar25 /unassign

xiaofan-luan commented 1 month ago

failed to read file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/1:

@yanliang567 looks like there are some file missing file/stats_log/448959102008730178/448959102008730179/448959102008730202/100/

from this collection. We don't know what happened but this is only stats log.

to fix that:

  1. use birdwatcher and find what log it actually mssing
  2. there should be a command to fix the log missing by remove the log (stats logs are fine to remove, check if there are insert and delta log missing)
  3. compact the collection and your old collection should be ok to work with.
stale[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.