Closed altonchuzhan closed 1 year ago
add a graph of memory:
add pprof heap.zip
@altonchuzhan
To figure out this issue, you should look at the code at dataSyncService at datanode
/assign @bigsheeper
from the datanode log, everything seems to be working as expected. I'm doubting it is caused by too many collections/partitions
from the datanode log, everything seems to be working as expected. I'm doubting it is caused by too many collections/partitions
one big collection with 78 partition with 1~10 million vector in each partition
about pulsar topic, you remind me of one action I have done to solve "time="2023-06-02T02:00:51Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" topic="persistent://public/default/by-dev-rootcoord-dml_12"
the command is bin/pulsar-admin topics unsubscribe "persistent://public/default/by-dev-rootcoord-dml_1" -s by-dev-dataNode-2131-by-dev-rootcoord-dml_1_440633149338133674v1
maybe I should have a try to restart rootcoord or datacoord
summary the log, after restart , the "Cannot create producer on topic with backlog quota exceeded" log back and accompany with many connection error.
rootcoord: time="2023-06-04T05:16:51Z" level=info msg="Broker notification of Closed producer: 1" local_addr="172.20.35.245:49102" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650" time="2023-06-04T05:16:51Z" level=warning msg="[Connection was closed]" cnx="172.20.35.245:49102 -> 192.168.220.30:6650" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:51Z" level=info msg="[Reconnecting to broker in 108.824695ms]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:51Z" level=info msg="[Reconnecting to broker in 109.140546ms]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:51Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:51Z" level=info msg="[Reconnecting to broker in 223.465899ms]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:51Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:51Z" level=info msg="[Reconnecting to broker in 208.253412ms]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:52Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:52Z" level=info msg="[Reconnecting to broker in 453.367203ms]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:52Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:52Z" level=info msg="[Reconnecting to broker in 440.545209ms]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:52Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:52Z" level=info msg="[Reconnecting to broker in 889.086429ms]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:52Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:52Z" level=info msg="[Reconnecting to broker in 883.781628ms]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:53Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:53Z" level=info msg="[Reconnecting to broker in 1.767344256s]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:53Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:53Z" level=info msg="[Reconnecting to broker in 1.744770488s]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" [2023/06/04 05:16:54.582 +00:00] [INFO] [rootcoord/root_coord.go:988] ["received request to describe collection"] [traceID=1532067647936bbf] ["collection name"=] [id=440633149273728439] [msgID=0] [ts=18446744073709551615] [allowUnavailable=true] [2023/06/04 05:16:54.582 +00:00] [INFO] [rootcoord/root_coord.go:1021] ["done to describe collection"] [traceID=1532067647936bbf] ["collection name"=] [id=440633149273728439] [msgID=0] [ts=18446744073709551615] [allowUnavailable=true] [collection_id=440633149273728439] time="2023-06-04T05:16:55Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:55Z" level=info msg="[Reconnecting to broker in 3.741955681s]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:55Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:55Z" level=info msg="[Reconnecting to broker in 3.446566097s]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:58Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:58Z" level=info msg="[Reconnecting to broker in 7.040909526s]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:58Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:16:58Z" level=info msg="[Reconnecting to broker in 6.549790947s]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:05Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:05Z" level=info msg="[Reconnecting to broker in 14.259479049s]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:05Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:17:05Z" level=info msg="[Reconnecting to broker in 14.816015229s]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:17:19Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:19Z" level=info msg="[Reconnecting to broker in 26.521887314s]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:20Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:17:20Z" level=info msg="[Reconnecting to broker in 28.902466475s]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" [2023/06/04 05:17:24.582 +00:00] [INFO] [rootcoord/root_coord.go:988] ["received request to describe collection"] [traceID=265fbdab0c3c1ba0] ["collection name"=] [id=440633149273728439] [msgID=0] [ts=18446744073709551615] [allowUnavailable=true] [2023/06/04 05:17:24.582 +00:00] [INFO] [rootcoord/root_coord.go:1021] ["done to describe collection"] [traceID=265fbdab0c3c1ba0] ["collection name"=] [id=440633149273728439] [msgID=0] [ts=18446744073709551615] [allowUnavailable=true] [collection_id=440633149273728439] [2023/06/04 05:17:37.664 +00:00] [INFO] [rootcoord/import_manager.go:177] ["(in cleanupLoop) trying to expire old tasks from memory and Etcd"] [2023/06/04 05:17:37.668 +00:00] [INFO] [rootcoord/import_manager.go:180] ["(in cleanupLoop) start removing bad import segments"] [2023/06/04 05:17:37.670 +00:00] [INFO] [rootcoord/import_manager.go:182] ["(in cleanupLoop) start cleaning hanging busy DataNode"] time="2023-06-04T05:17:46Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:46Z" level=info msg="[Reconnecting to broker in 54.198483753s]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:49Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:17:49Z" level=info msg="[Reconnecting to broker in 52.753638008s]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" [2023/06/04 05:17:54.581 +00:00] [INFO] [rootcoord/root_coord.go:988] ["received request to describe collection"] [traceID=679ba61455826d3] ["collection name"=] [id=440633149273728439] [msgID=0] [ts=18446744073709551615] [allowUnavailable=true] [2023/06/04 05:17:54.581 +00:00] [INFO] [rootcoord/root_coord.go:1021] ["done to describe collection"] [traceID=679ba61455826d3] ["collection name"=] [id=440633149273728439] [msgID=0] [ts=18446744073709551615] [allowUnavailable=true] [collection_id=440633149273728439] [2023/06/04 05:18:24.582 +00:00] [INFO] [rootcoord/root_coord.go:988] ["received request to describe collection"] [traceID=7827ec310da29039] ["collection name"=] [id=440633149273728439] [msgID=0] [ts=18446744073709551615] [allowUnavailable=true] [2023/06/04 05:18:24.582 +00:00] [INFO] [rootcoord/root_coord.go:1021] ["done to describe collection"] [traceID=7827ec310da29039] ["collection name"=] [id=440633149273728439] [msgID=0] [ts=18446744073709551615] [allowUnavailable=true] [collection_id=440633149273728439] time="2023-06-04T05:18:40Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:18:40Z" level=info msg="[Reconnecting to broker in 1m9.280931565s]" producerID=1 producer_name=milvus-mining-pulsar-19-16277 topic="persistent://public/default/by-dev-rootcoord-dml_0" [2023/06/04 05:18:40.672 +00:00] [WARN] [rootcoord/quota_center.go:608] ["QuotaCenter: DataNode memory to low water level, limit writing rate"] [Node=datanode-2860] [UsedMem=95216676864] [TotalMem=107374182400] [memoryWaterLevel=0.8867744064331055] [memoryLowWaterLevel=0.85] time="2023-06-04T05:18:42Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:18:42Z" level=info msg="[Reconnecting to broker in 1m2.347003086s]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" [2023/06/04 05:18:43.672 +00:00] [WARN] [rootcoord/quota_center.go:608] ["QuotaCenter: DataNode memory to low water level, limit writing rate"] [Node=datanode-2860] [UsedMem=99093413888] [TotalMem=107374182400] [memoryWaterLevel=0.9228793334960937] [memoryLowWaterLevel=0.85]
datacoord:
[2023/06/04 05:17:53.824 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:17:53.825 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=440633149273728439] [partitionID=-1] [2023/06/04 05:17:54.580 +00:00] [INFO] [datacoord/services.go:784] ["get recovery info request received"] [collectionID=440633149273728439] [partitionIDs="[]"] [2023/06/04 05:17:54.583 +00:00] [INFO] [datacoord/handler.go:119] [GetQueryVChanPositions] [collectionID=440633149273728439] [channel=by-dev-rootcoord-dml_2_440633149273728439v0] [numOfSegments=21] [2023/06/04 05:17:54.583 +00:00] [INFO] [datacoord/handler.go:261] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_2_440633149273728439v0] [posTs=441907383838703617] [posTime=2023/06/02 21:54:48.789 +00:00] [2023/06/04 05:17:54.583 +00:00] [INFO] [datacoord/services.go:819] ["datacoord append channelInfo in GetRecoveryInfo"] [collectionID=440633149273728439] [partitionIDs="[]"] [channelInfo="collectionID:440633149273728439 channelName:\"by-dev-rootcoord-dml_2_440633149273728439v0\" seek_position:<channel_name:\"by-dev-rootcoord-dml_2_440633149273728439v0\" msgID:\"\010\364\353\006\020\370\324\001\030\000 \000\" msgGroup:\"by-dev-dataNode-2860-by-dev-rootcoord-dml_2_440633149273728439v0\" timestamp:441907383838703617 > flushedSegmentIds:440633149274231606 flushedSegmentIds:440633149274232803 flushedSegmentIds:440633149274239744 flushedSegmentIds:440633149274224900 flushedSegmentIds:440633149274241602 flushedSegmentIds:440633149274240590 flushedSegmentIds:440633149274223740 flushedSegmentIds:440633149274229761 flushedSegmentIds:440633149274226364 flushedSegmentIds:440633149274239596 flushedSegmentIds:440633149274228660 flushedSegmentIds:440633149274229377 flushedSegmentIds:440633149274235170 flushedSegmentIds:440633149274238653 flushedSegmentIds:440633149274231458 flushedSegmentIds:440633149274222372 flushedSegmentIds:440633149274221592 flushedSegmentIds:440633149274221459 flushedSegmentIds:440633149274242302 flushedSegmentIds:440633149274230796 flushedSegmentIds:440633149274224334 "] [2023/06/04 05:17:54.584 +00:00] [INFO] [datacoord/handler.go:119] [GetQueryVChanPositions] [collectionID=440633149273728439] [channel=by-dev-rootcoord-dml_3_440633149273728439v1] [numOfSegments=21] [2023/06/04 05:17:54.584 +00:00] [INFO] [datacoord/handler.go:261] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_3_440633149273728439v1] [posTs=441907320871976961] [posTime=2023/06/02 21:50:48.590 +00:00] [2023/06/04 05:17:54.584 +00:00] [INFO] [datacoord/services.go:819] ["datacoord append channelInfo in GetRecoveryInfo"] [collectionID=440633149273728439] [partitionIDs="[]"] [channelInfo="collectionID:440633149273728439 channelName:\"by-dev-rootcoord-dml_3_440633149273728439v1\" seek_position:<channel_name:\"by-dev-rootcoord-dml_3_440633149273728439v1\" msgID:\"\010\377\353\006\020\323\301\001\030\000 \000\" msgGroup:\"by-dev-dataNode-2860-by-dev-rootcoord-dml_3_440633149273728439v1\" timestamp:441907320871976961 > flushedSegmentIds:440633149274231596 flushedSegmentIds:440633149274230795 flushedSegmentIds:440633149274238663 flushedSegmentIds:440633149274221460 flushedSegmentIds:440633149274238650 flushedSegmentIds:440633149274233402 flushedSegmentIds:440633149274226880 flushedSegmentIds:440633149274222371 flushedSegmentIds:440633149274240589 flushedSegmentIds:440633149274242132 flushedSegmentIds:440633149274239743 flushedSegmentIds:440633149274224340 flushedSegmentIds:440633149274229378 flushedSegmentIds:440633149274231605 flushedSegmentIds:440633149274223711 flushedSegmentIds:440633149274240599 flushedSegmentIds:440633149274230198 flushedSegmentIds:440633149274234882 flushedSegmentIds:440633149274224901 flushedSegmentIds:440633149274228821 flushedSegmentIds:440633149274221591 "] [2023/06/04 05:17:59.894 +00:00] [INFO] [datacoord/channel_checker.go:152] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_0_440633149338133674v0] [timerCount=0] [2023/06/04 05:17:59.894 +00:00] [WARN] [datacoord/channel_checker.go:121] ["timeout and stop timer: wait for channel ACK timeout"] ["watch state"=ToWatch] [nodeID=2860] ["channel name"=by-dev-rootcoord-dml_0_440633149338133674v0] ["timeout interval"=2m0s] [runningTimerCount=0] [2023/06/04 05:17:59.894 +00:00] [INFO] [datacoord/channel_manager.go:677] ["receive timeout acks from state watcher"] [state=3] [nodeID=2860] ["channel name"=by-dev-rootcoord-dml_0_440633149338133674v0] [2023/06/04 05:17:59.894 +00:00] [WARN] [datacoord/channel_manager.go:627] ["datanode watch channel failed or timeout, will release"] [nodeID=2860] [channel=by-dev-rootcoord-dml_0_440633149338133674v0] [2023/06/04 05:17:59.894 +00:00] [INFO] [datacoord/channel_checker.go:152] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_1_440633149338133674v1] [timerCount=1] [2023/06/04 05:17:59.894 +00:00] [WARN] [datacoord/channel_checker.go:121] ["timeout and stop timer: wait for channel ACK timeout"] ["watch state"=ToWatch] [nodeID=2860] ["channel name"=by-dev-rootcoord-dml_1_440633149338133674v1] ["timeout interval"=2m0s] [runningTimerCount=0] [2023/06/04 05:17:59.894 +00:00] [INFO] [datacoord/handler.go:61] [GetDataVChanPositions] [collectionID=440633149338133674] [channel=by-dev-rootcoord-dml_0_440633149338133674v0] [numOfSegments=84] [2023/06/04 05:17:59.895 +00:00] [INFO] [datacoord/handler.go:235] ["NEITHER segment position or channel start position are found, setting channel seek position to collection start position"] [channel=by-dev-rootcoord-dml_0_440633149338133674v0] [posTs=441873538680094725] [posTime=2023/06/01 10:02:59.755 +00:00] [2023/06/04 05:17:59.895 +00:00] [INFO] [datacoord/handler.go:279] ["channel seek position set from collection start position"] [channel=by-dev-rootcoord-dml_0_440633149338133674v0] [posTs=441873538680094725] [posTime=2023/06/01 10:02:59.755 +00:00] [2023/06/04 05:17:59.895 +00:00] [INFO] [datacoord/channel_checker.go:110] ["timer started"] ["watch state"=ToRelease] [nodeID=2860] ["channel name"=by-dev-rootcoord-dml_0_440633149338133674v0] ["check interval"=2m0s] [2023/06/04 05:17:59.900 +00:00] [INFO] [datacoord/channel_manager.go:677] ["receive timeout acks from state watcher"] [state=3] [nodeID=2860] ["channel name"=by-dev-rootcoord-dml_1_440633149338133674v1] [2023/06/04 05:17:59.900 +00:00] [WARN] [datacoord/channel_manager.go:627] ["datanode watch channel failed or timeout, will release"] [nodeID=2860] [channel=by-dev-rootcoord-dml_1_440633149338133674v1] [2023/06/04 05:17:59.900 +00:00] [INFO] [datacoord/handler.go:61] [GetDataVChanPositions] [collectionID=440633149338133674] [channel=by-dev-rootcoord-dml_1_440633149338133674v1] [numOfSegments=84] [2023/06/04 05:17:59.900 +00:00] [INFO] [datacoord/handler.go:235] ["NEITHER segment position or channel start position are found, setting channel seek position to collection start position"] [channel=by-dev-rootcoord-dml_1_440633149338133674v1] [posTs=441873538680094725] [posTime=2023/06/01 10:02:59.755 +00:00] [2023/06/04 05:17:59.900 +00:00] [INFO] [datacoord/handler.go:279] ["channel seek position set from collection start position"] [channel=by-dev-rootcoord-dml_1_440633149338133674v1] [posTs=441873538680094725] [posTime=2023/06/01 10:02:59.755 +00:00] [2023/06/04 05:17:59.900 +00:00] [INFO] [datacoord/channel_checker.go:110] ["timer started"] ["watch state"=ToRelease] [nodeID=2860] ["channel name"=by-dev-rootcoord-dml_1_440633149338133674v1] ["check interval"=2m0s] [2023/06/04 05:17:59.904 +00:00] [INFO] [datacoord/channel_manager.go:720] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_0_440633149338133674v0] [progress=0] [2023/06/04 05:17:59.904 +00:00] [INFO] [datacoord/channel_manager.go:720] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_1_440633149338133674v1] [progress=0] [2023/06/04 05:18:24.581 +00:00] [INFO] [datacoord/services.go:784] ["get recovery info request received"] [collectionID=440633149273728439] [partitionIDs="[]"] [2023/06/04 05:18:24.583 +00:00] [INFO] [datacoord/handler.go:119] [GetQueryVChanPositions] [collectionID=440633149273728439] [channel=by-dev-rootcoord-dml_2_440633149273728439v0] [numOfSegments=21] [2023/06/04 05:18:24.583 +00:00] [INFO] [datacoord/handler.go:261] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_2_440633149273728439v0] [posTs=441907383838703617] [posTime=2023/06/02 21:54:48.789 +00:00] [2023/06/04 05:18:24.583 +00:00] [INFO] [datacoord/services.go:819] ["datacoord append channelInfo in GetRecoveryInfo"] [collectionID=440633149273728439] [partitionIDs="[]"] [channelInfo="collectionID:440633149273728439 channelName:\"by-dev-rootcoord-dml_2_440633149273728439v0\" seek_position:<channel_name:\"by-dev-rootcoord-dml_2_440633149273728439v0\" msgID:\"\010\364\353\006\020\370\324\001\030\000 \000\" msgGroup:\"by-dev-dataNode-2860-by-dev-rootcoord-dml_2_440633149273728439v0\" timestamp:441907383838703617 > flushedSegmentIds:440633149274224900 flushedSegmentIds:440633149274221592 flushedSegmentIds:440633149274232803 flushedSegmentIds:440633149274240590 flushedSegmentIds:440633149274228660 flushedSegmentIds:440633149274242302 flushedSegmentIds:440633149274241602 flushedSegmentIds:440633149274231606 flushedSegmentIds:440633149274221459 flushedSegmentIds:440633149274239596 flushedSegmentIds:440633149274229377 flushedSegmentIds:440633149274231458 flushedSegmentIds:440633149274223740 flushedSegmentIds:440633149274230796 flushedSegmentIds:440633149274229761 flushedSegmentIds:440633149274239744 flushedSegmentIds:440633149274224334 flushedSegmentIds:440633149274222372 flushedSegmentIds:440633149274235170 flushedSegmentIds:440633149274226364 flushedSegmentIds:440633149274238653 "] [2023/06/04 05:18:24.585 +00:00] [INFO] [datacoord/handler.go:119] [GetQueryVChanPositions] [collectionID=440633149273728439] [channel=by-dev-rootcoord-dml_3_440633149273728439v1] [numOfSegments=21] [2023/06/04 05:18:24.585 +00:00] [INFO] [datacoord/handler.go:261] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_3_440633149273728439v1] [posTs=441907320871976961] [posTime=2023/06/02 21:50:48.590 +00:00] [2023/06/04 05:18:24.585 +00:00] [INFO] [datacoord/services.go:819] ["datacoord append channelInfo in GetRecoveryInfo"] [collectionID=440633149273728439] [partitionIDs="[]"] [channelInfo="collectionID:440633149273728439 channelName:\"by-dev-rootcoord-dml_3_440633149273728439v1\" seek_position:<channel_name:\"by-dev-rootcoord-dml_3_440633149273728439v1\" msgID:\"\010\377\353\006\020\323\301\001\030\000 \000\" msgGroup:\"by-dev-dataNode-2860-by-dev-rootcoord-dml_3_440633149273728439v1\" timestamp:441907320871976961 > flushedSegmentIds:440633149274222371 flushedSegmentIds:440633149274239743 flushedSegmentIds:440633149274238663 flushedSegmentIds:440633149274228821 flushedSegmentIds:440633149274221591 flushedSegmentIds:440633149274224340 flushedSegmentIds:440633149274231596 flushedSegmentIds:440633149274231605 flushedSegmentIds:440633149274240599 flushedSegmentIds:440633149274224901 flushedSegmentIds:440633149274233402 flushedSegmentIds:440633149274242132 flushedSegmentIds:440633149274238650 flushedSegmentIds:440633149274240589 flushedSegmentIds:440633149274230198 flushedSegmentIds:440633149274234882 flushedSegmentIds:440633149274229378 flushedSegmentIds:440633149274230795 flushedSegmentIds:440633149274223711 flushedSegmentIds:440633149274221460 flushedSegmentIds:440633149274226880 "] [2023/06/04 05:18:28.188 +00:00] [INFO] [datacoord/channel_manager.go:273] ["ChannelManager is not silent, skip channel balance this round"] [2023/06/04 05:18:28.418 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.475 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.529 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.579 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.658 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.707 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.756 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.813 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.862 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.910 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:28.961 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:29.010 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:29.060 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:29.113 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1] [2023/06/04 05:18:29.171 +00:00] [INFO] [datacoord/services.go:885] ["received get flushed segments request"] [collectionID=439652203113153329] [partitionID=-1]
datanode: [2023/06/04 05:16:30.096 +00:00] [INFO] [datanode/data_node.go:350] ["DataNode received a PUT event with tickler update progress"] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [version=3] [2023/06/04 05:16:30.097 +00:00] [INFO] [datanode/data_node.go:350] ["DataNode received a PUT event with tickler update progress"] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [version=3] time="2023-06-04T05:16:31Z" level=info msg="Failing 1 messages" producerID=2 producer_name=milvus-mining-pulsar-19-16307 topic="persistent://public/default/by-dev-rootcoord-delta_2" time="2023-06-04T05:16:31Z" level=info msg="Failing 1 messages" producerID=1 producer_name=milvus-mining-pulsar-19-16308 topic="persistent://public/default/by-dev-rootcoord-delta_3" [2023/06/04 05:16:31.673 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="message send timeout: TimeoutError"] [2023/06/04 05:16:31.673 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="message send timeout: TimeoutError"] time="2023-06-04T05:16:31Z" level=info msg="Failing 1 messages" producerID=4 producer_name=milvus-mining-pulsar-19-16309 topic="persistent://public/default/by-dev-datacoord-timetick-channel" time="2023-06-04T05:16:31Z" level=info msg="Failing 1 messages" producerID=3 producer_name=milvus-mining-pulsar-19-16310 topic="persistent://public/default/by-dev-datacoord-timetick-channel" [2023/06/04 05:16:31.688 +00:00] [ERROR] [datanode/flow_graph_time_ticker.go:130] ["send hard time tick failed"] [error="message send timeout: TimeoutError"] [stack="github.com/milvus-io/milvus/internal/datanode.(mergedTimeTickerSender).work\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flow_graph_time_ticker.go:130"] [2023/06/04 05:16:31.688 +00:00] [ERROR] [datanode/flow_graph_time_ticker.go:130] ["send hard time tick failed"] [error="message send timeout: TimeoutError"] [stack="github.com/milvus-io/milvus/internal/datanode.(mergedTimeTickerSender).work\n\t/go/src/github.com/milvus-io/milvus/internal/datanode/flow_graph_time_ticker.go:130"] time="2023-06-04T05:16:53Z" level=error msg="[Failed to create consumer]" consumerID=5 error="request timed out" name=ebftb subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13" [2023/06/04 05:16:53.016 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="request timed out"] time="2023-06-04T05:16:53Z" level=error msg="[Failed to create consumer]" consumerID=5 error="request timed out" name=ebftb subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13" time="2023-06-04T05:16:53Z" level=error msg="[Failed to create consumer]" consumerID=6 error="request timed out" name=ijjgq subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12" time="2023-06-04T05:16:53Z" level=error msg="[Failed to create consumer]" consumerID=6 error="request timed out" name=ijjgq subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12" [2023/06/04 05:16:53.188 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="request timed out"] time="2023-06-04T05:17:01Z" level=info msg="Failing 1 messages" producerID=1 producer_name=milvus-mining-pulsar-19-16308 topic="persistent://public/default/by-dev-rootcoord-delta_3" time="2023-06-04T05:17:01Z" level=info msg="Failing 1 messages" producerID=2 producer_name=milvus-mining-pulsar-19-16307 topic="persistent://public/default/by-dev-rootcoord-delta_2" time="2023-06-04T05:17:23Z" level=error msg="[Failed to create consumer]" consumerID=7 error="request timed out" name=phemg subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13" time="2023-06-04T05:17:23Z" level=error msg="[Failed to create consumer]" consumerID=7 error="request timed out" name=phemg subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13" time="2023-06-04T05:17:23Z" level=error msg="[Failed to create consumer]" consumerID=8 error="request timed out" name=uexgh subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12" time="2023-06-04T05:17:23Z" level=error msg="[Failed to create consumer]" consumerID=8 error="request timed out" name=uexgh subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12" time="2023-06-04T05:17:29Z" level=warning msg="[Detected stale connection to broker]" local_addr="172.20.34.13:46652" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650" time="2023-06-04T05:17:29Z" level=warning msg="[Connection was closed]" cnx="172.20.34.13:46652 -> 192.168.220.30:6650" producerID=2 producer_name=milvus-mining-pulsar-19-16307 topic="persistent://public/default/by-dev-rootcoord-delta_2" time="2023-06-04T05:17:29Z" level=warning msg="[Connection was closed]" cnx="172.20.34.13:46652 -> 192.168.220.30:6650" producerID=1 producer_name=milvus-mining-pulsar-19-16308 topic="persistent://public/default/by-dev-rootcoord-delta_3" time="2023-06-04T05:17:29Z" level=warning msg="[Connection was closed]" cnx="172.20.34.13:46652 -> 192.168.220.30:6650" producerID=4 producer_name=milvus-mining-pulsar-19-16309 topic="persistent://public/default/by-dev-datacoord-timetick-channel" time="2023-06-04T05:17:29Z" level=warning msg="[Connection was closed]" cnx="172.20.34.13:46652 -> 192.168.220.30:6650" producerID=3 producer_name=milvus-mining-pulsar-19-16310 topic="persistent://public/default/by-dev-datacoord-timetick-channel" time="2023-06-04T05:17:29Z" level=info msg="[Reconnecting to broker in 114.957814ms]" consumerID=4 name=sbdlf subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_3_440633149273728439v1 topic="persistent://public/default/by-dev-rootcoord-dml_3" time="2023-06-04T05:17:29Z" level=info msg="[Reconnecting to broker in 117.93067ms]" consumerID=3 name=qnctf subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_2_440633149273728439v0 topic="persistent://public/default/by-dev-rootcoord-dml_2" time="2023-06-04T05:17:29Z" level=info msg="[Reconnecting to broker in 118.14378ms]" producerID=1 producer_name=milvus-mining-pulsar-19-16308 topic="persistent://public/default/by-dev-rootcoord-delta_3" time="2023-06-04T05:17:29Z" level=info msg="[Reconnecting to broker in 118.780906ms]" consumerID=1 name=wkdbk subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_1_440633149338133674v1 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:17:29Z" level=info msg="[Reconnecting to broker in 112.124947ms]" producerID=2 producer_name=milvus-mining-pulsar-19-16307 topic="persistent://public/default/by-dev-rootcoord-delta_2" time="2023-06-04T05:17:29Z" level=info msg="[Reconnecting to broker in 109.349966ms]" consumerID=2 name=fqkun subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_0_440633149338133674v0 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:29Z" level=info msg="[Reconnecting to broker in 104.605582ms]" producerID=3 producer_name=milvus-mining-pulsar-19-16310 topic="persistent://public/default/by-dev-datacoord-timetick-channel" time="2023-06-04T05:17:29Z" level=info msg="[Reconnecting to broker in 119.557477ms]" producerID=4 producer_name=milvus-mining-pulsar-19-16309 topic="persistent://public/default/by-dev-datacoord-timetick-channel" time="2023-06-04T05:17:30Z" level=info msg="[Connecting to broker]" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650" time="2023-06-04T05:17:30Z" level=info msg="[TCP connection established]" local_addr="172.20.34.13:46852" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650" time="2023-06-04T05:17:30Z" level=info msg="[Connection is ready]" local_addr="172.20.34.13:46852" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650" time="2023-06-04T05:17:30Z" level=info msg="[Connected consumer]" consumerID=2 name=fqkun subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_0_440633149338133674v0 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:30Z" level=info msg="[Reconnected producer to broker]" cnx="172.20.34.13:46852 -> 192.168.220.30:6650" producerID=3 producer_name=milvus-mining-pulsar-19-16310 topic="persistent://public/default/by-dev-datacoord-timetick-channel" time="2023-06-04T05:17:30Z" level=info msg="[Reconnected consumer to broker]" consumerID=2 name=fqkun subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_0_440633149338133674v0 topic="persistent://public/default/by-dev-rootcoord-dml_0" time="2023-06-04T05:17:30Z" level=info msg="Resending 1 pending batches" producerID=2 producer_name=milvus-mining-pulsar-19-16307 topic="persistent://public/default/by-dev-rootcoord-delta_2" time="2023-06-04T05:17:30Z" level=info msg="[Reconnected producer to broker]" cnx="172.20.34.13:46852 -> 192.168.220.30:6650" producerID=2 producer_name=milvus-mining-pulsar-19-16307 topic="persistent://public/default/by-dev-rootcoord-delta_2" time="2023-06-04T05:17:30Z" level=warning msg="[Got unexpected message: ledgerId:112127 entryId:29094 partition:-1 ]" consumerID=4 local_addr="172.20.34.13:46652" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650" time="2023-06-04T05:17:30Z" level=warning msg="[Got unexpected message: ledgerId:112127 entryId:29095 partition:-1 ]" consumerID=4 local_addr="172.20.34.13:46652" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650" time="2023-06-04T05:17:30Z" level=warning msg="[Failed to write on connection]" error="write tcp 172.20.34.13:46652->192.168.220.30:6650: use of closed network connection" local_addr="172.20.34.13:46652" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650" time="2023-06-04T05:17:30Z" level=error msg="[Failed to create consumer]" consumerID=10 error="connection closed" name=usmrf subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12" time="2023-06-04T05:17:30Z" level=error msg="[Failed to create consumer]" consumerID=10 error="connection closed" name=usmrf subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12" time="2023-06-04T05:17:30Z" level=error msg="[Failed to create consumer]" consumerID=9 error="connection closed" name=gyapr subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13" time="2023-06-04T05:17:30Z" level=error msg="[Failed to create consumer]" consumerID=9 error="connection closed" name=gyapr subscription=by-dev-dataNode-2860-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13"
I am quite sure no one is writing/inserting any data, who is the producer, why it keep creating producer, and also why backlog quota exceeded.
the remaining data should be in the pulsar as committed log(which didn't covert to the sealed segment), why datanode need that much memory, the inserting data should be put in brand new partition.
Is there a "reset" button to clear the pulsar state, I can afford lossing the newly inserted data. the search service is blocking now, else I may have to restart the world.
Seems like this is a reason
time="2023-06-04T05:16:51Z" level=error msg="[Failed to create producer]" error="server error: ProducerBlockedQuotaExceededException: Cannot create producer on topic with backlog quota exceeded" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1" time="2023-06-04T05:16:51Z" level=info msg="[Reconnecting to broker in 223.465899ms]" producerID=2 producer_name=milvus-mining-pulsar-19-16278 topic="persistent://public/default/by-dev-rootcoord-dml_1"
@LoveEachDay @congqixia I remember I add some expire time for pulsar consumer right? How do we verify the expire time works?
Good news: I deleted all topic of pulsar in public/default, previous errors gone. and my test collection is ready for search.
Bad news:
the big collection is facing a new problem:
fail to search on all shard leaders, err=All attempts results: attempt #1:code: UnexpectedError, error: fail to Search, QueryNode ID=2450, reason=Search 2450 failed, reason Timestamp lag too large lag(36h8m1.814s) max(24h0m0s) err %!w(
Seems that milvus is still catching up to recover from previous logs.
To recover, change dataCoord.channel.watchTimeoutInterval to 600 ms should be good enough
the dataCoord.channel.watchTimeoutInterval is int (seconds), I use config like this:
then a lot of timeout error..
[2023/06/04 14:33:28.414 +00:00] [INFO] [datacoord/channel_manager.go:677] ["receive timeout acks from state watcher"] [state=3] [nodeID=2921] ["channel name"=by-dev-rootcoord-dml_13_439652203113153329v1]
[2023/06/04 14:33:28.414 +00:00] [WARN] [datacoord/channel_manager.go:627] ["datanode watch channel failed or timeout, will release"] [nodeID=2921] [channel=by-dev-rootcoord-dml_13_439652203113153329v1]
[2023/06/04 14:33:28.414 +00:00] [INFO] [datacoord/handler.go:61] [GetDataVChanPositions] [collectionID=439652203113153329] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [numOfSegments=663]
[2023/06/04 14:33:28.414 +00:00] [INFO] [datacoord/handler.go:261] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [posTs=441945553313464322] [posTime=2023/06/04 14:21:33.780 +00:00]
[2023/06/04 14:33:28.414 +00:00] [INFO] [datacoord/channel_checker.go:110] ["timer started"] ["watch state"=ToRelease] [nodeID=2921] ["channel name"=by-dev-rootcoord-dml_13_439652203113153329v1] ["check interval"=1s]
[2023/06/04 14:33:28.422 +00:00] [INFO] [datacoord/channel_manager.go:720] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [progress=0]
[2023/06/04 14:33:28.739 +00:00] [INFO] [datacoord/channel_checker.go:152] ["remove timer for channel"] [channel=by-dev-rootcoord-dml_3_440633149273728439v1] [timerCount=3]
[2023/06/04 14:33:28.739 +00:00] [WARN] [datacoord/channel_checker.go:121] ["timeout and stop timer: wait for channel ACK timeout"] ["watch state"=ToWatch] [nodeID=2921] ["channel name"=by-dev-rootcoord-dml_3_440633149273728439v1] ["timeout interval"=1s] [runningTimerCount=3]
[2023/06/04 14:33:28.739 +00:00] [INFO] [datacoord/channel_manager.go:677] ["receive timeout acks from state watcher"] [state=3] [nodeID=2921] ["channel name"=by-dev-rootcoord-dml_3_440633149273728439v1]
[2023/06/04 14:33:28.739 +00:00] [WARN] [datacoord/channel_manager.go:627] ["datanode watch channel failed or timeout, will release"] [nodeID=2921] [channel=by-dev-rootcoord-dml_3_440633149273728439v1]
[2023/06/04 14:33:28.739 +00:00] [INFO] [datacoord/handler.go:61] [GetDataVChanPositions] [collectionID=440633149273728439] [channel=by-dev-rootcoord-dml_3_440633149273728439v1] [numOfSegments=21]
[2023/06/04 14:33:28.739 +00:00] [INFO] [datacoord/handler.go:261] ["channel seek position set from channel checkpoint meta"] [channel=by-dev-rootcoord-dml_3_440633149273728439v1] [posTs=441945553313464322] [posTime=2023/06/04 14:21:33.780 +00:00]
[2023/06/04 14:33:28.739 +00:00] [INFO] [datacoord/channel_checker.go:110] ["timer started"] ["watch state"=ToRelease] [nodeID=2921] ["channel name"=by-dev-rootcoord-dml_3_440633149273728439v1] ["check interval"=1s]
[2023/06/04 14:33:28.779 +00:00] [INFO] [datacoord/channel_manager.go:720] ["tickle update, timer delay"] [channel=by-dev-rootcoord-dml_3_440633149273728439v1] [progress=0]
changed it back, then everything seems working as expect
Could you elaborate more about this parameter, I am confusing about the timeout logic.
changed it back, then everything seems working as expect
Could you elaborate more about this parameter, I am confusing about the timeout logic.
Good to know. During milvus crash, it take some time to read from kafka and load data from S3. if timeout reaches coordinator will reassign to a new node and restart. So the timeout better to be larger.
But the default 10s should be good for most cases.
https://github.com/milvus-io/milvus/pull/24656 with go payload wirter, datanode oom issue may be resolved.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
Is there an existing issue for this?
Environment
Current Behavior
the datanode is keeping OOM (> 100G).
every query hangs to timeout after this issue happened.
the problem starts after I do some data inserting. I have tried restart datanode, reload collection. and I double checked every other component is healthy and didnt find suspicious log.
Expected Behavior
world peace
is there a way to clear the message storage? I can afford the newly data loss.
Steps To Reproduce
Milvus Log
the end of oom killed datanode log:
.... [2023/06/02 19:03:46.978 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203155864783] ["stats bin logs"=1] [2023/06/02 19:03:46.985 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203124604278] [time=268.333275ms] [size=1256086] [2023/06/02 19:03:46.985 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=439652203217157725] [numRows=113576] [2023/06/02 19:03:46.985 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203169329753] [collectionID=439652203113153329] [partitionID=439652203144399841] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:46.985 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203169329753] ["stats bin logs"=1]
[2023/06/02 19:03:46.993 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203124403758] [time=245.403582ms] [size=1250617]
[2023/06/02 19:03:46.993 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=440633149184427333] [numRows=113853]
[2023/06/02 19:03:46.993 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203194246102] [collectionID=439652203113153329] [partitionID=439652203174754870] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:46.993 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203194246102] ["stats bin logs"=1]
[2023/06/02 19:03:47.022 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203168927997] [time=163.419482ms] [size=1251146]
[2023/06/02 19:03:47.022 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=440633149176606994] [numRows=113316]
[2023/06/02 19:03:47.022 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203217157725] [collectionID=439652203113153329] [partitionID=439652203174754914] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.022 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203217157725] ["stats bin logs"=1]
[2023/06/02 19:03:47.043 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203224592172] [time=216.219345ms] [size=1246845]
[2023/06/02 19:03:47.043 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203224592392] [numRows=113313]
[2023/06/02 19:03:47.043 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=440633149184427333] [collectionID=439652203113153329] [partitionID=439652203221176155] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.043 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=440633149184427333] ["stats bin logs"=1]
[2023/06/02 19:03:47.089 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203187815593] [time=173.266836ms] [size=1248554]
[2023/06/02 19:03:47.089 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=441893573302587254] [numRows=137446]
[2023/06/02 19:03:47.089 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=440633149176606994] [collectionID=439652203113153329] [partitionID=439652203221176111] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.089 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=440633149176606994] ["stats bin logs"=1]
[2023/06/02 19:03:47.098 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203147817657] [time=209.936665ms] [size=1251852]
[2023/06/02 19:03:47.098 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203174353192] [numRows=113277]
[2023/06/02 19:03:47.098 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203224592392] [collectionID=439652203113153329] [partitionID=439652203221176220] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.099 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203224592392] ["stats bin logs"=1]
[2023/06/02 19:03:47.117 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=440633149197862091] [time=164.506157ms] [size=1253219]
[2023/06/02 19:03:47.117 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=439652203198065412] [numRows=113751]
[2023/06/02 19:03:47.117 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=441893573302587254] [collectionID=439652203113153329] [partitionID=439652203135753710] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.117 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=441893573302587254] ["stats bin logs"=1]
[2023/06/02 19:03:47.139 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203130030523] [time=206.226141ms] [size=1248345]
[2023/06/02 19:03:47.139 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=441893573302582326] [numRows=97733]
[2023/06/02 19:03:47.139 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203174353192] [collectionID=439652203113153329] [partitionID=439652203144399812] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.139 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203174353192] ["stats bin logs"=1]
[2023/06/02 19:03:47.164 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203139374573] [time=245.167139ms] [size=1255050]
[2023/06/02 19:03:47.164 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=440633149194452631] [numRows=113748]
[2023/06/02 19:03:47.164 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203198065412] [collectionID=439652203113153329] [partitionID=439652203174754894] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.164 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203198065412] ["stats bin logs"=1]
[2023/06/02 19:03:47.172 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203169329753] [time=186.825724ms] [size=1249911]
[2023/06/02 19:03:47.172 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203140982928] [numRows=113213]
[2023/06/02 19:03:47.172 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=441893573302582326] [collectionID=439652203113153329] [partitionID=439652203128020577] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.172 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=441893573302582326] ["stats bin logs"=1]
[2023/06/02 19:03:47.174 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203155864783] [time=196.229443ms] [size=1246713]
[2023/06/02 19:03:47.174 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=440633149205281463] [numRows=113334]
[2023/06/02 19:03:47.174 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=440633149194452631] [collectionID=439652203113153329] [partitionID=439652203221176101] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.174 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=440633149194452631] ["stats bin logs"=1]
[2023/06/02 19:03:47.186 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=440633149184427333] [time=143.103271ms] [size=1255546]
[2023/06/02 19:03:47.186 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203192437809] [numRows=113255]
[2023/06/02 19:03:47.186 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203140982928] [collectionID=439652203113153329] [partitionID=439652203135753710] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.186 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203140982928] ["stats bin logs"=1]
[2023/06/02 19:03:47.192 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203194246102] [time=198.462651ms] [size=1247860]
[2023/06/02 19:03:47.192 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=439652203164306134] [numRows=113351]
[2023/06/02 19:03:47.192 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=440633149205281463] [collectionID=439652203113153329] [partitionID=440633149199466349] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.192 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=440633149205281463] ["stats bin logs"=1]
[2023/06/02 19:03:47.256 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203217157725] [time=233.759768ms] [size=1252491]
[2023/06/02 19:03:47.256 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203199270735] [numRows=113826]
[2023/06/02 19:03:47.256 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203192437809] [collectionID=439652203113153329] [partitionID=439652203174754870] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.256 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203192437809] ["stats bin logs"=1]
[2023/06/02 19:03:47.274 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=440633149176606994] [time=185.152807ms] [size=1249624]
[2023/06/02 19:03:47.274 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=439652203113154286] [numRows=60]
[2023/06/02 19:03:47.274 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203164306134] [collectionID=439652203113153329] [partitionID=439652203144399774] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.274 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203164306134] ["stats bin logs"=1]
[2023/06/02 19:03:47.311 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=440633149194452631] [time=136.927388ms] [size=1254388]
[2023/06/02 19:03:47.311 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=441890336846682663] [numRows=151186]
[2023/06/02 19:03:47.311 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203199270735] [collectionID=439652203113153329] [partitionID=439652203174754870] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.311 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203199270735] ["stats bin logs"=1]
[2023/06/02 19:03:47.322 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=441893573302582326] [time=149.664934ms] [size=1077778]
[2023/06/02 19:03:47.322 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=440633149189640635] [numRows=113358]
[2023/06/02 19:03:47.322 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203113154286] [collectionID=439652203113153329] [partitionID=439652203113154277] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.322 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203113154286] ["stats bin logs"=1]
[2023/06/02 19:03:47.324 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203224592392] [time=225.852233ms] [size=1249591]
[2023/06/02 19:03:47.324 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203214746646] [numRows=113431]
[2023/06/02 19:03:47.324 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=441890336846682663] [collectionID=439652203113153329] [partitionID=440633149199466391] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.324 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=441890336846682663] ["stats bin logs"=1]
[2023/06/02 19:03:47.333 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=441893573302587254] [time=215.73827ms] [size=1515724]
[2023/06/02 19:03:47.333 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [segmentID=439652203215349565] [numRows=113501]
[2023/06/02 19:03:47.333 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=440633149189640635] [collectionID=439652203113153329] [partitionID=439652203221176175] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.333 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=440633149189640635] ["stats bin logs"=1]
[2023/06/02 19:03:47.340 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203113154286] [time=18.239296ms] [size=662]
[2023/06/02 19:03:47.340 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=440633149170388339] [numRows=96193]
[2023/06/02 19:03:47.340 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203214746646] [collectionID=439652203113153329] [partitionID=439652203174754851] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.341 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203214746646] ["stats bin logs"=1]
[2023/06/02 19:03:47.349 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203174353192] [time=210.054169ms] [size=1249194]
[2023/06/02 19:03:47.349 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203215349565] [collectionID=439652203113153329] [partitionID=439652203174754807] [channel=by-dev-rootcoord-dml_13_439652203113153329v1] [startPosition=] [endPosition=] [recoverTs=441904113015586817] [importing=false]
[2023/06/02 19:03:47.349 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203215349565] ["stats bin logs"=1]
[2023/06/02 19:03:47.373 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203140982928] [time=186.429922ms] [size=1248488]
[2023/06/02 19:03:47.373 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203213138891] [numRows=113654]
[2023/06/02 19:03:47.373 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=440633149170388339] [collectionID=439652203113153329] [partitionID=439652203221176135] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.373 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=440633149170388339] ["stats bin logs"=1]
[2023/06/02 19:03:47.386 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=440633149205281463] [time=193.955631ms] [size=1249823]
[2023/06/02 19:03:47.386 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203146209619] [numRows=113176]
[2023/06/02 19:03:47.386 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203213138891] [collectionID=439652203113153329] [partitionID=439652203174754793] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.386 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203213138891] ["stats bin logs"=1]
[2023/06/02 19:03:47.422 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203164306134] [time=147.750548ms] [size=1250010]
[2023/06/02 19:03:47.422 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203191030572] [numRows=113177]
[2023/06/02 19:03:47.422 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203146209619] [collectionID=439652203113153329] [partitionID=439652203144399841] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.422 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203146209619] ["stats bin logs"=1]
[2023/06/02 19:03:47.457 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203198065412] [time=292.417849ms] [size=1254421]
[2023/06/02 19:03:47.457 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=441893573302565962] [numRows=98215]
[2023/06/02 19:03:47.457 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203191030572] [collectionID=439652203113153329] [partitionID=439652203174754914] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.457 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203191030572] ["stats bin logs"=1]
[2023/06/02 19:03:47.459 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203192437809] [time=202.725354ms] [size=1248951]
[2023/06/02 19:03:47.459 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=439652203197462357] [numRows=112846]
[2023/06/02 19:03:47.459 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=441893573302565962] [collectionID=439652203113153329] [partitionID=439652203128020563] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.459 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=441893573302565962] ["stats bin logs"=1]
[2023/06/02 19:03:47.482 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=440633149170388339] [time=109.417318ms] [size=1060795]
[2023/06/02 19:03:47.482 +00:00] [INFO] [datanode/data_sync_service.go:249] ["recover sealed segments form checkpoints"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [segmentID=440633149179414343] [numRows=113524]
[2023/06/02 19:03:47.482 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=439652203197462357] [collectionID=439652203113153329] [partitionID=439652203174754914] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.482 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=439652203197462357] ["stats bin logs"=1]
[2023/06/02 19:03:47.482 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=440633149189640635] [time=149.049023ms] [size=1250087]
[2023/06/02 19:03:47.482 +00:00] [INFO] [datanode/channel_meta.go:199] ["adding segment"] [type=Flushed] [segmentID=440633149179414343] [collectionID=439652203113153329] [partitionID=439652203221176200] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [startPosition=] [endPosition=] [recoverTs=441904101481512961] [importing=false]
[2023/06/02 19:03:47.482 +00:00] [INFO] [datanode/channel_meta.go:299] ["begin to init pk bloom filter"] [segmentID=440633149179414343] ["stats bin logs"=1]
[2023/06/02 19:03:47.509 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203214746646] [time=168.161764ms] [size=1250892]
[2023/06/02 19:03:47.515 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203199270735] [time=203.922071ms] [size=1255248]
[2023/06/02 19:03:47.522 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203215349565] [time=172.385263ms] [size=1251664]
[2023/06/02 19:03:47.531 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203213138891] [time=145.321288ms] [size=1253351]
[2023/06/02 19:03:47.586 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=441890336846682663] [time=261.747424ms] [size=1667246]
time="2023-06-02T19:03:47Z" level=info msg="[Connected consumer]" consumerID=5 name=hlgto subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13"
time="2023-06-02T19:03:47Z" level=info msg="[Created consumer]" consumerID=5 name=hlgto subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13"
[2023/06/02 19:03:47.609 +00:00] [INFO] [datanode/flow_graph_dmstream_input_node.go:56] ["datanode begin to seek"] ["seek msgID"="\u0008\ufffd\ufffd\u0006\u0010\ufffdH\u0018\u0000 \u0000"] [pchannel=by-dev-rootcoord-dml_13] [vchannel=by-dev-rootcoord-dml_13_439652203113153329v1] [position=2023/06/02 18:26:51.589 +00:00] [tsLag=36m56.02024883s] ["collection ID"=439652203113153329]
[2023/06/02 19:03:47.609 +00:00] [INFO] [msgstream/mq_msgstream.go:880] ["MsgStream begin to seek start msg: "] [channel=by-dev-rootcoord-dml_13] [MessageID="CJjmBhDaSBgAIAA="]
time="2023-06-02T19:03:47Z" level=info msg="Broker notification of Closed consumer: 5" local_addr="172.20.35.67:34516" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650"
time="2023-06-02T19:03:47Z" level=info msg="[Reconnecting to broker in 107.189728ms]" consumerID=5 name=hlgto subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13"
[2023/06/02 19:03:47.615 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203146209619] [time=192.886442ms] [size=1248080]
[2023/06/02 19:03:47.631 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=441893573302565962] [time=172.415622ms] [size=1083093]
[2023/06/02 19:03:47.660 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203197462357] [time=178.206527ms] [size=1244441]
[2023/06/02 19:03:47.697 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=440633149179414343] [time=214.33075ms] [size=1251918]
time="2023-06-02T19:03:47Z" level=info msg="[Connected consumer]" consumerID=5 name=hlgto subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13"
time="2023-06-02T19:03:47Z" level=info msg="[Reconnected consumer to broker]" consumerID=5 name=hlgto subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_13_439652203113153329v1 topic="persistent://public/default/by-dev-rootcoord-dml_13"
[2023/06/02 19:03:47.719 +00:00] [INFO] [msgstream/mq_msgstream.go:890] ["MsgStream seek finished"] [channel=by-dev-rootcoord-dml_13]
[2023/06/02 19:03:47.725 +00:00] [INFO] [datanode/flow_graph_dmstream_input_node.go:67] ["datanode seek successfully"] ["seek msgID"="\u0008\ufffd\ufffd\u0006\u0010\ufffdH\u0018\u0000 \u0000"] [pchannel=by-dev-rootcoord-dml_13] [vchannel=by-dev-rootcoord-dml_13_439652203113153329v1] [position=2023/06/02 18:26:51.589 +00:00] [tsLag=36m56.136456496s] ["collection ID"=439652203113153329] [elapse=116.209618ms]
[2023/06/02 19:03:47.725 +00:00] [INFO] [datanode/flow_graph_dmstream_input_node.go:79] ["datanode AsConsumer"] ["physical channel"=by-dev-rootcoord-dml_13] [subName=by-dev-dataNode-2557-by-dev-rootcoord-dml_13_439652203113153329v1] ["collection ID"=439652203113153329]
[2023/06/02 19:03:47.725 +00:00] [INFO] [datanode/flow_graph_dd_node.go:376] ["ddNode convert vChannel to pChannel"] [vChannelName=by-dev-rootcoord-dml_13_439652203113153329v1] [pChannelName=by-dev-rootcoord-dml_13]
time="2023-06-02T19:03:47Z" level=info msg="[Created producer]" cnx="172.20.35.67:34516 -> 192.168.220.30:6650" producerID=5 producer_name=milvus-mining-pulsar-19-13666 topic="persistent://public/default/by-dev-rootcoord-delta_13"
[2023/06/02 19:03:47.729 +00:00] [INFO] [datanode/flow_graph_dd_node.go:388] ["datanode AsProducer"] [DeltaChannelName=by-dev-rootcoord-delta_13]
[2023/06/02 19:03:47.729 +00:00] [INFO] [datanode/flow_graph_dd_node.go:413] ["ddNode add sealed and growing segments"] [collectionID=439652203113153329] ["No. sealed segments"=663] ["No. growing segments"=0]
time="2023-06-02T19:03:47Z" level=info msg="[Created producer]" cnx="172.20.35.67:34516 -> 192.168.220.30:6650" producerID=6 producer_name=milvus-mining-pulsar-19-13667 topic="persistent://public/default/by-dev-datacoord-timetick-channel"
[2023/06/02 19:03:47.731 +00:00] [INFO] [datanode/flow_graph_insert_buffer_node.go:683] ["datanode AsProducer"] [TimeTickChannelName=by-dev-datacoord-timetick-channel]
[2023/06/02 19:03:47.731 +00:00] [INFO] [datanode/data_sync_service.go:141] ["dataSyncService starting flow graph"] [collectionID=439652203113153329] [vChanName=by-dev-rootcoord-dml_13_439652203113153329v1]
[2023/06/02 19:03:47.731 +00:00] [INFO] [datanode/data_node.go:410] ["handle put event: new data sync service success"] [vChanName=by-dev-rootcoord-dml_13_439652203113153329v1]
[2023/06/02 19:03:47.739 +00:00] [INFO] [datanode/data_node.go:442] ["handle put event success"] [key=channelwatch/2557/by-dev-rootcoord-dml_13_439652203113153329v1] [state=WatchSuccess] [vChanName=by-dev-rootcoord-dml_13_439652203113153329v1]
[2023/06/02 19:03:47.739 +00:00] [INFO] [datanode/data_node.go:345] ["DataNode received a PUT event with an end State"] [state=WatchSuccess]
[2023/06/02 19:03:47.812 +00:00] [INFO] [datanode/channel_meta.go:358] ["Successfully load pk stats"] [segmentID=439652203191030572] [time=355.601121ms] [size=1248091]
time="2023-06-02T19:03:47Z" level=info msg="[Connected consumer]" consumerID=6 name=felne subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12"
time="2023-06-02T19:03:47Z" level=info msg="[Created consumer]" consumerID=6 name=felne subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12"
[2023/06/02 19:03:47.845 +00:00] [INFO] [datanode/flow_graph_dmstream_input_node.go:56] ["datanode begin to seek"] ["seek msgID"="\u0008\ufffd\ufffd\u0006\u0010\ufffdG\u0018\u0000 \u0000"] [pchannel=by-dev-rootcoord-dml_12] [vchannel=by-dev-rootcoord-dml_12_439652203113153329v0] [position=2023/06/02 18:26:07.590 +00:00] [tsLag=37m40.255337754s] ["collection ID"=439652203113153329]
[2023/06/02 19:03:47.845 +00:00] [INFO] [msgstream/mq_msgstream.go:880] ["MsgStream begin to seek start msg: "] [channel=by-dev-rootcoord-dml_12] [MessageID="CI7mBhCERxgAIAA="]
time="2023-06-02T19:03:47Z" level=info msg="Broker notification of Closed consumer: 6" local_addr="172.20.35.67:34516" remote_addr="pulsar://milvus-mining-pulsar-proxy:6650"
time="2023-06-02T19:03:47Z" level=info msg="[Reconnecting to broker in 112.017806ms]" consumerID=6 name=felne subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12"
time="2023-06-02T19:03:47Z" level=info msg="[Connected consumer]" consumerID=6 name=felne subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12"
time="2023-06-02T19:03:47Z" level=info msg="[Reconnected consumer to broker]" consumerID=6 name=felne subscription=by-dev-dataNode-2557-by-dev-rootcoord-dml_12_439652203113153329v0 topic="persistent://public/default/by-dev-rootcoord-dml_12"
[2023/06/02 19:03:47.976 +00:00] [INFO] [msgstream/mq_msgstream.go:890] ["MsgStream seek finished"] [channel=by-dev-rootcoord-dml_12]
[2023/06/02 19:03:48.018 +00:00] [INFO] [datanode/flow_graph_dmstream_input_node.go:67] ["datanode seek successfully"] ["seek msgID"="\u0008\ufffd\ufffd\u0006\u0010\ufffdG\u0018\u0000 \u0000"] [pchannel=by-dev-rootcoord-dml_12] [vchannel=by-dev-rootcoord-dml_12_439652203113153329v0] [position=2023/06/02 18:26:07.590 +00:00] [tsLag=37m40.428154485s] ["collection ID"=439652203113153329] [elapse=172.818703ms]
[2023/06/02 19:03:48.018 +00:00] [INFO] [datanode/flow_graph_dmstream_input_node.go:79] ["datanode AsConsumer"] ["physical channel"=by-dev-rootcoord-dml_12] [subName=by-dev-dataNode-2557-by-dev-rootcoord-dml_12_439652203113153329v0] ["collection ID"=439652203113153329]
[2023/06/02 19:03:48.018 +00:00] [INFO] [datanode/flow_graph_dd_node.go:376] ["ddNode convert vChannel to pChannel"] [vChannelName=by-dev-rootcoord-dml_12_439652203113153329v0] [pChannelName=by-dev-rootcoord-dml_12]
time="2023-06-02T19:03:48Z" level=info msg="[Created producer]" cnx="172.20.35.67:34516 -> 192.168.220.30:6650" producerID=7 producer_name=milvus-mining-pulsar-19-13668 topic="persistent://public/default/by-dev-rootcoord-delta_12"
[2023/06/02 19:03:48.075 +00:00] [INFO] [datanode/flow_graph_dd_node.go:388] ["datanode AsProducer"] [DeltaChannelName=by-dev-rootcoord-delta_12]
[2023/06/02 19:03:48.075 +00:00] [INFO] [datanode/flow_graph_dd_node.go:413] ["ddNode add sealed and growing segments"] [collectionID=439652203113153329] ["No. sealed segments"=660] ["No. growing segments"=0]
time="2023-06-02T19:03:48Z" level=info msg="[Created producer]" cnx="172.20.35.67:34516 -> 192.168.220.30:6650" producerID=8 producer_name=milvus-mining-pulsar-19-13669 topic="persistent://public/default/by-dev-datacoord-timetick-channel"
[2023/06/02 19:03:48.130 +00:00] [INFO] [datanode/flow_graph_insert_buffer_node.go:683] ["datanode AsProducer"] [TimeTickChannelName=by-dev-datacoord-timetick-channel]
[2023/06/02 19:03:48.130 +00:00] [INFO] [datanode/data_sync_service.go:141] ["dataSyncService starting flow graph"] [collectionID=439652203113153329] [vChanName=by-dev-rootcoord-dml_12_439652203113153329v0]
[2023/06/02 19:03:48.130 +00:00] [INFO] [datanode/data_node.go:410] ["handle put event: new data sync service success"] [vChanName=by-dev-rootcoord-dml_12_439652203113153329v0]
[2023/06/02 19:03:48.137 +00:00] [INFO] [datanode/data_node.go:442] ["handle put event success"] [key=channelwatch/2557/by-dev-rootcoord-dml_12_439652203113153329v0] [state=WatchSuccess] [vChanName=by-dev-rootcoord-dml_12_439652203113153329v0]
[2023/06/02 19:03:48.137 +00:00] [INFO] [datanode/data_node.go:345] ["DataNode received a PUT event with an end State"] [state=WatchSuccess]
[2023/06/02 19:03:48.221 +00:00] [INFO] [datanode/flow_graph_time_tick_node.go:115] ["UpdateChannelCheckpoint success"] [channel=by-dev-rootcoord-dml_12_439652203113153329v0] [cpTs=441904101533941761] [cpTime=2023/06/02 18:26:07.790 +00:00]
Anything else?
No response