milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.08k stars 2.88k forks source link

[Bug]: log warning collection not found,but i don't have this collection. #27841

Closed wcollin closed 10 months ago

wcollin commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: 2.3.1
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): V2.3.0
- OS(Ubuntu or CentOS): Centos
- CPU/Memory: 8C/16G
- GPU: None
- Others: None

Current Behavior

[2023/10/21 08:15:26.492 +00:00] [WARN] [rootcoord/quota_center.go:742] ["failed to get collection rate limit config"] [collectionID=444885560750714875] [error="collection=444885560750714875: collection not found"] [errorVerbose="collection=444885560750714875: collection not found\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.wrapWithField\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:531\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrCollectionNotFound\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:242\n | github.com/milvus-io/milvus/internal/rootcoord.(MetaTable).getLatestCollectionByIDInternal\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/meta_table.go:503\n | github.com/milvus-io/milvus/internal/rootcoord.(MetaTable).getCollectionByIDInternal\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/meta_table.go:511\n | github.com/milvus-io/milvus/internal/rootcoord.(MetaTable).GetCollectionByID\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/meta_table.go:595\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).getCollectionLimitConfig\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:740\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).checkDiskQuota\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:769\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).calculateWriteRates\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:431\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).calculateRates\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:680\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).run\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:149\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) collection=444885560750714875\nWraps: (3) collection not found\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) merr.milvusError"]

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

xiaofan-luan commented 1 year ago
  1. what operation did trigger this log?
  2. could you share the code if it's a client side error?
  3. we need full log to track the exact reason
yanliang567 commented 1 year ago

/assign @wcollin /unassign

zspishere commented 11 months ago

I also got the same error as below. It occured just after the rootcoord be launched, and no human triggers. And it seems be triggered every 10s.

milvus_rootcoord.1.435k793iun5o@test | [2023/10/30 16:36:57.971 +08:00] [WARN] [rootcoord/quota_center.go:742] ["failed to get collection rate limit config"] [collectionID=445204369917889111] [error="collection=445204369917889111: collection not found"] [errorVerbose="collection=445204369917889111: collection not found\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.wrapWithField\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:760\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrCollectionNotFound\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:372\n | github.com/milvus-io/milvus/internal/rootcoord.(MetaTable).getLatestCollectionByIDInternal\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/meta_table.go:503\n | github.com/milvus-io/milvus/internal/rootcoord.(MetaTable).getCollectionByIDInternal\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/meta_table.go:511\n | github.com/milvus-io/milvus/internal/rootcoord.(MetaTable).GetCollectionByID\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/meta_table.go:595\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).getCollectionLimitConfig\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:740\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).checkDiskQuota\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:769\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).calculateWriteRates\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:431\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).calculateRates\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:680\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).run\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:149\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) collection=445204369917889111\nWraps: (3) collection not found\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) merr.milvusError"] milvus_rootcoord.1.435k793iun5o@test | [2023/10/30 16:36:58.502 +08:00] [INFO] [rootcoord/root_coord.go:1142] ["received request to describe collection"] [traceID=cb17e2b73d1e28c37432f4da3838e7a0] [collectionName=] [dbName=] [id=445291254308930633] [ts=18446744073709551615] [allowUnavailable=false] milvus_rootcoord.1.435k793iun5o@test | [2023/10/30 16:36:58.502 +08:00] [INFO] [rootcoord/root_coord.go:1171] ["done to describe collection"] [traceID=cb17e2b73d1e28c37432f4da3838e7a0] [collectionName=] [dbName=] [id=445291254308930633] [ts=18446744073709551615] [allowUnavailable=false] [collection_id=445291254308930633]

milvus_rootcoord.1.435k793iun5o@test | [2023/10/30 16:37:08.502 +08:00] [INFO] [rootcoord/root_coord.go:1142] ["received request to describe collection"] [traceID=72813ecb1ba36602f006fd2a79330a5e] [collectionName=] [dbName=] [id=445291254308930633] [ts=18446744073709551615] [allowUnavailable=false] milvus_rootcoord.1.435k793iun5o@test | [2023/10/30 16:37:08.502 +08:00] [INFO] [rootcoord/root_coord.go:1171] ["done to describe collection"] [traceID=72813ecb1ba36602f006fd2a79330a5e] [collectionName=] [dbName=] [id=445291254308930633] [ts=18446744073709551615] [allowUnavailable=false] [collection_id=445291254308930633] milvus_rootcoord.1.435k793iun5o@test | [2023/10/30 16:37:09.973 +08:00] [WARN] [rootcoord/quota_center.go:742] ["failed to get collection rate limit config"] [collectionID=445204369917889111] [error="collection=445204369917889111: collection not found"] [errorVerbose="collection=445204369917889111: collection not found\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/pkg/util/merr.wrapWithField\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:760\n | github.com/milvus-io/milvus/pkg/util/merr.WrapErrCollectionNotFound\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/merr/utils.go:372\n | github.com/milvus-io/milvus/internal/rootcoord.(MetaTable).getLatestCollectionByIDInternal\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/meta_table.go:503\n | github.com/milvus-io/milvus/internal/rootcoord.(MetaTable).getCollectionByIDInternal\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/meta_table.go:511\n | github.com/milvus-io/milvus/internal/rootcoord.(MetaTable).GetCollectionByID\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/meta_table.go:595\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).getCollectionLimitConfig\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:740\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).checkDiskQuota\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:769\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).calculateWriteRates\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:431\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).calculateRates\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:680\n | github.com/milvus-io/milvus/internal/rootcoord.(QuotaCenter).run\n | \t/go/src/github.com/milvus-io/milvus/internal/rootcoord/quota_center.go:149\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) collection=445204369917889111\nWraps: (3) collection not found\nError types: (1) withstack.withStack (2) errutil.withPrefix (3) merr.milvusError"]

yanliang567 commented 11 months ago

could you please provide full logs for investigation? Could you please refer this doc to export the whole Milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs. @zspishere

zspishere commented 11 months ago

@yanliang567 I run Milvus in swarm cluster, so the log collect script maybe not work for me.

zspishere commented 11 months ago

However, i check other services, and ONLY find errors as below in datacoord container. And i think it maybe related to THIS issue. @yanliang567

[2023/10/30 16:48:54.787 +08:00] [INFO] [datacoord/channel_manager.go:744] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [progress=0] [2023/10/30 16:48:54.796 +08:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [timerCount=0] [2023/10/30 16:48:54.796 +08:00] [WARN] [datacoord/channel_manager.go:650] ["datanode watch channel failed or timeout, will release"] [nodeID=602] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [2023/10/30 16:48:54.796 +08:00] [INFO] [datacoord/handler.go:62] [GetDataVChanPositions] [collectionID=445204369917889111] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [numOfSegments=1] [2023/10/30 16:48:54.796 +08:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToWatch] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["timeout interval"=5m0s] [runningTimerCount=0] [2023/10/30 16:48:54.796 +08:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [posTs=445250214596444161] [posTime=2023/10/28 20:06:16.140 +08:00] [2023/10/30 16:48:54.796 +08:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToRelease] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["check interval"=5m0s] [2023/10/30 16:48:54.804 +08:00] [INFO] [datacoord/channel_manager.go:744] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [progress=0] [2023/10/30 16:48:54.812 +08:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [timerCount=0] [2023/10/30 16:48:54.812 +08:00] [INFO] [datacoord/channel_manager.go:669] ["datanode release channel successfully, will reassign"] [nodeID=602] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [2023/10/30 16:48:54.812 +08:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToRelease] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["timeout interval"=5m0s] [runningTimerCount=0] [2023/10/30 16:48:54.813 +08:00] [INFO] [datacoord/policy.go:474] ["AverageReassignPolicy working"] [avaNodesCount=3] [toAssignChannelNum=1] [avaNodesChannelSum=1] [2023/10/30 16:48:54.813 +08:00] [INFO] [datacoord/channel_manager.go:821] ["channel manager reassigning channels"] ["old node ID"=602] [updates="[\"{type=Delete,nodeID=602,channels=\\"[by-dev-rootcoord-dml_0_445204369917889111v0]\\"}\",\"{type=Add,nodeID=600,channels=\\"[by-dev-rootcoord-dml_0_445204369917889111v0]\\"}\"]"] [2023/10/30 16:48:54.813 +08:00] [INFO] [datacoord/handler.go:62] [GetDataVChanPositions] [collectionID=445204369917889111] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [numOfSegments=1] [2023/10/30 16:48:54.813 +08:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [posTs=445250214596444161] [posTime=2023/10/28 20:06:16.140 +08:00] [2023/10/30 16:48:54.813 +08:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToWatch] [nodeID=600] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["check interval"=5m0s] [2023/10/30 16:48:54.840 +08:00] [INFO] [datacoord/channel_manager.go:744] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [progress=0] [2023/10/30 16:48:54.854 +08:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [timerCount=0] [2023/10/30 16:48:54.854 +08:00] [WARN] [datacoord/channel_manager.go:650] ["datanode watch channel failed or timeout, will release"] [nodeID=600] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [2023/10/30 16:48:54.854 +08:00] [INFO] [datacoord/handler.go:62] [GetDataVChanPositions] [collectionID=445204369917889111] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [numOfSegments=1] [2023/10/30 16:48:54.854 +08:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [posTs=445250214596444161] [posTime=2023/10/28 20:06:16.140 +08:00] [2023/10/30 16:48:54.854 +08:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToWatch] [nodeID=600] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["timeout interval"=5m0s] [runningTimerCount=0] [2023/10/30 16:48:54.854 +08:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToRelease] [nodeID=600] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["check interval"=5m0s] [2023/10/30 16:48:54.862 +08:00] [INFO] [datacoord/channel_manager.go:744] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [progress=0] [2023/10/30 16:48:54.871 +08:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [timerCount=0] [2023/10/30 16:48:54.871 +08:00] [INFO] [datacoord/channel_manager.go:669] ["datanode release channel successfully, will reassign"] [nodeID=600] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [2023/10/30 16:48:54.871 +08:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToRelease] [nodeID=600] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["timeout interval"=5m0s] [runningTimerCount=0] [2023/10/30 16:48:54.903 +08:00] [INFO] [datacoord/policy.go:474] ["AverageReassignPolicy working"] [avaNodesCount=3] [toAssignChannelNum=1] [avaNodesChannelSum=1] [2023/10/30 16:48:54.904 +08:00] [INFO] [datacoord/channel_manager.go:821] ["channel manager reassigning channels"] ["old node ID"=600] [updates="[\"{type=Delete,nodeID=600,channels=\\"[by-dev-rootcoord-dml_0_445204369917889111v0]\\"}\",\"{type=Add,nodeID=602,channels=\\"[by-dev-rootcoord-dml_0_445204369917889111v0]\\"}\"]"] [2023/10/30 16:48:54.904 +08:00] [INFO] [datacoord/handler.go:62] [GetDataVChanPositions] [collectionID=445204369917889111] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [numOfSegments=1] [2023/10/30 16:48:54.904 +08:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [posTs=445250214596444161] [posTime=2023/10/28 20:06:16.140 +08:00] [2023/10/30 16:48:54.904 +08:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToWatch] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["check interval"=5m0s] [2023/10/30 16:48:54.921 +08:00] [INFO] [datacoord/channel_manager.go:744] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [progress=0] [2023/10/30 16:48:54.929 +08:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [timerCount=0] [2023/10/30 16:48:54.929 +08:00] [WARN] [datacoord/channel_manager.go:650] ["datanode watch channel failed or timeout, will release"] [nodeID=602] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [2023/10/30 16:48:54.929 +08:00] [INFO] [datacoord/handler.go:62] [GetDataVChanPositions] [collectionID=445204369917889111] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [numOfSegments=1] [2023/10/30 16:48:54.929 +08:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [posTs=445250214596444161] [posTime=2023/10/28 20:06:16.140 +08:00] [2023/10/30 16:48:54.929 +08:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToWatch] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["timeout interval"=5m0s] [runningTimerCount=0] [2023/10/30 16:48:54.929 +08:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToRelease] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["check interval"=5m0s] [2023/10/30 16:48:54.937 +08:00] [INFO] [datacoord/channel_manager.go:744] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [progress=0] [2023/10/30 16:48:54.966 +08:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [timerCount=0] [2023/10/30 16:48:54.966 +08:00] [INFO] [datacoord/channel_manager.go:669] ["datanode release channel successfully, will reassign"] [nodeID=602] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [2023/10/30 16:48:54.966 +08:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToRelease] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["timeout interval"=5m0s] [runningTimerCount=0] [2023/10/30 16:48:54.966 +08:00] [INFO] [datacoord/policy.go:474] ["AverageReassignPolicy working"] [avaNodesCount=3] [toAssignChannelNum=1] [avaNodesChannelSum=1] [2023/10/30 16:48:54.966 +08:00] [INFO] [datacoord/channel_manager.go:821] ["channel manager reassigning channels"] ["old node ID"=602] [updates="[\"{type=Delete,nodeID=602,channels=\\"[by-dev-rootcoord-dml_0_445204369917889111v0]\\"}\",\"{type=Add,nodeID=600,channels=\\"[by-dev-rootcoord-dml_0_445204369917889111v0]\\"}\"]"] [2023/10/30 16:48:54.967 +08:00] [INFO] [datacoord/handler.go:62] [GetDataVChanPositions] [collectionID=445204369917889111] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [numOfSegments=1] [2023/10/30 16:48:54.967 +08:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [posTs=445250214596444161] [posTime=2023/10/28 20:06:16.140 +08:00] [2023/10/30 16:48:54.967 +08:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToWatch] [nodeID=600] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["check interval"=5m0s] [2023/10/30 16:48:54.979 +08:00] [INFO] [datacoord/channel_manager.go:744] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [progress=0] [2023/10/30 16:48:54.996 +08:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [timerCount=0] [2023/10/30 16:48:54.996 +08:00] [WARN] [datacoord/channel_manager.go:650] ["datanode watch channel failed or timeout, will release"] [nodeID=600] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [2023/10/30 16:48:54.996 +08:00] [INFO] [datacoord/handler.go:62] [GetDataVChanPositions] [collectionID=445204369917889111] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [numOfSegments=1] [2023/10/30 16:48:54.996 +08:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToWatch] [nodeID=600] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["timeout interval"=5m0s] [runningTimerCount=0] [2023/10/30 16:48:54.996 +08:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [posTs=445250214596444161] [posTime=2023/10/28 20:06:16.140 +08:00] [2023/10/30 16:48:54.996 +08:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToRelease] [nodeID=600] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["check interval"=5m0s] [2023/10/30 16:48:55.004 +08:00] [INFO] [datacoord/channel_manager.go:744] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [progress=0] [2023/10/30 16:48:55.029 +08:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [timerCount=0] [2023/10/30 16:48:55.029 +08:00] [INFO] [datacoord/channel_manager.go:669] ["datanode release channel successfully, will reassign"] [nodeID=600] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [2023/10/30 16:48:55.029 +08:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToRelease] [nodeID=600] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["timeout interval"=5m0s] [runningTimerCount=0] [2023/10/30 16:48:55.029 +08:00] [INFO] [datacoord/policy.go:474] ["AverageReassignPolicy working"] [avaNodesCount=3] [toAssignChannelNum=1] [avaNodesChannelSum=1] [2023/10/30 16:48:55.029 +08:00] [INFO] [datacoord/channel_manager.go:821] ["channel manager reassigning channels"] ["old node ID"=600] [updates="[\"{type=Delete,nodeID=600,channels=\\"[by-dev-rootcoord-dml_0_445204369917889111v0]\\"}\",\"{type=Add,nodeID=602,channels=\\"[by-dev-rootcoord-dml_0_445204369917889111v0]\\"}\"]"] [2023/10/30 16:48:55.029 +08:00] [INFO] [datacoord/handler.go:62] [GetDataVChanPositions] [collectionID=445204369917889111] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [numOfSegments=1] [2023/10/30 16:48:55.030 +08:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [posTs=445250214596444161] [posTime=2023/10/28 20:06:16.140 +08:00] [2023/10/30 16:48:55.030 +08:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToWatch] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["check interval"=5m0s] [2023/10/30 16:48:55.046 +08:00] [INFO] [datacoord/channel_manager.go:744] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [progress=0] [2023/10/30 16:48:55.082 +08:00] [INFO] [datacoord/channel_checker.go:167] ["stop timer for channel"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [timerCount=0] [2023/10/30 16:48:55.082 +08:00] [WARN] [datacoord/channel_manager.go:650] ["datanode watch channel failed or timeout, will release"] [nodeID=602] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [2023/10/30 16:48:55.082 +08:00] [INFO] [datacoord/handler.go:62] [GetDataVChanPositions] [collectionID=445204369917889111] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [numOfSegments=1] [2023/10/30 16:48:55.082 +08:00] [INFO] [datacoord/handler.go:299] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_0_445204369917889111v0] [posTs=445250214596444161] [posTime=2023/10/28 20:06:16.140 +08:00] [2023/10/30 16:48:55.082 +08:00] [INFO] [datacoord/channel_checker.go:134] ["stop timer before timeout"] ["watch state"=ToWatch] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["timeout interval"=5m0s] [runningTimerCount=0] [2023/10/30 16:48:55.083 +08:00] [INFO] [datacoord/channel_checker.go:113] ["timer started"] ["watch state"=ToRelease] [nodeID=602] [channelName=by-dev-rootcoord-dml_0_445204369917889111v0] ["check interval"=5m0s]

yanliang567 commented 11 months ago

Mmm, swarm is not in the official support list, and we are not able to address the issue without the full milvus logs. :(

zspishere commented 11 months ago

@wcollin found the issue in standalone mode, so i think the error occured not depend on the deployment mode. And @wcollin maybe offer the full milvus logs.

shaiktas commented 11 months ago

Hi, i too have the same issue with cluster mode from root coordinator [WARN] [rootcoord/quota_center.go:742] ["failed to get collection rate limit config"] [collectionID=445345169243756901] [error="collection=445345169243756901: collection not found"], not sure is it any config related. Here i am attaching rootcoordinator full start up logs along with error: milvus-rootcoordinatorlogs-failedtogetcollectionratelimitconfig.txt

yanliang567 commented 11 months ago

the warning msg is a bit confusing, @bigsheeper could you please double confirm that? /assign @bigsheeper

xiaofan-luan commented 11 months ago

this seems to be an issue we need to take care of.

bigsheeper commented 11 months ago

the warning msg is a bit confusing, @bigsheeper could you please double confirm that? /assign @bigsheeper

This phenomenon occurs solely when collections are dropped in DataNodes or QueryNodes. It is reasonable if this log is printed momentarily (within a span of 1 minute) after the collection has been dropped. Conversely, it is UNreasonable if this log persists for an extended duration (exceeding 10 minutes) after the collection has been dropped, in this case, there is a possibility of the collection leaking in the DataNodes or QueryNodes. @shaiktas @zspishere

bigsheeper commented 11 months ago

I shall enhance this log as it is not config related. You're right, it is confusing. @yanliang567

xiaofan-luan commented 11 months ago

@bigsheeper I think there is leakage somewhere.

Anyone could offer the log if possible?

shaiktas commented 11 months ago

hello @xiaofan-luan @bigsheeper i am not sure which log you are looking, i already attached log in my thread. here is same again.

Hi, i too have the same issue with cluster mode from root coordinator [WARN] [rootcoord/quota_center.go:742] ["failed to get collection rate limit config"] [collectionID=445345169243756901] [error="collection=445345169243756901: collection not found"], not sure is it any config related. Here i am attaching rootcoordinator full start up logs along with error: milvus-rootcoordinatorlogs-failedtogetcollectionratelimitconfig.txt

bigsheeper commented 11 months ago

Hi @shaiktas, I have a few questions:

  1. How long does this printing of the WARN log endure?
  2. What operations did you do before starting up RootCoord?

And it would be greatly appreciated if you can refer this doc to export the complete Milvus logs for further investigation.

zzggfly commented 11 months ago

I have the same issue. milvus 2.3.2, standalone, golang sdk. I use upsert() to batch insert data, and the func return nil, but no data was found when I use GetCollectionStatistics() or QueryByPks() collection list: Dingtalk_20231117170747

logs: [Uploading mil3000.log…]()

xiaofan-luan commented 11 months ago

I have the same issue. milvus 2.3.2, standalone, golang sdk. I use upsert() to batch insert data, and the func return nil, but no data was found when I use GetCollectionStatistics() or QueryByPks() collection list: Dingtalk_20231117170747

logs: Uploading mil3000.log…

I think this is a separate topic. For the default consistency, data maybe invisible for up to 3 seconds. This is what we called as bounded staleness. Usually it takes 200 milliseconds to load on querynode

xiaofan-luan commented 11 months ago

And GetCollectionStatistics is async, QueryByPks with strong consistency should return you the right log

stale[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.