Closed icodersky closed 1 year ago
milvus-log.zip milvus-log.z01.zip milvus-log.z02.zip milvus-log.z03.zip
These are zip separate volumes. "milvus-log.z01.zip, milvus-log.z02.zip, milvus-log.z03.zip" ,this files's suffix name '.zip' needs to be removed. Like this: milvus-log.z01, milvus-log.z02, milvus-log.z03
By the way, before load collection, I had Changed the milvus from the previous configuration:
helm upgrade --set indexNode.replicas=20 --set dataNode.replicas=20 --set proxy.replicas=9 --set queryNode.replicas=30 --set attu.enabled=true hx-release milvus/milvus
to the current configuration:
helm upgrade --set indexNode.replicas=5 --set dataNode.replicas=5 --set proxy.replicas=9 --set queryNode.replicas=20 --set attu.enabled=true hx-release milvus/milvus
/assign @jiaoew1991
129016 [2022/09/28 06:40:03.109 +00:00] [WARN] [querycoord/cluster.go:253] ["loadSegments: queryNode load segments error"] [nodeID=1262] ["error info"="[UnexpectedError] Assert \"row_countopt.value() == size\" at / go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:177\n => field (0) data has different row count (19299) than other column's row count (17003)"]
129016 [2022/09/28 06:40:03.109 +00:00] [WARN] [querycoord/cluster.go:253] ["loadSegments: queryNode load segments error"] [nodeID=1262] ["error info"="[UnexpectedError] Assert "row_countopt.value() == size" at / go/src/github.com/milvus-io/milvus/internal/core/src/segcore/SegmentSealedImpl.cpp:177\n => field (0) data has different row count (19299) than other column's row count (17003)"]
What does this mean? Is there a problem with the inserted data? or problem with the index data?
How to solve this problem?
Some querynode are now automatically restarted.
What does this mean? Is there a problem with the inserted data? or problem with the index data?
How to solve this problem?
Seems like a problem with the index data. The numRow of segment 436168328301772801 is 20152, and field 0 has numRow 20152, but the str indexes only contain 20139 numRows.
crash stack
18005 fatal error: unexpected signal during runtime execution
18006 [signal SIGSEGV: segmentation violation code=0x1 addr=0x2f0b pc=0x2f0b]
18007
18008 runtime stack:
18009 runtime.throw(0x4065a52, 0x2a)
18010 /usr/local/go/src/runtime/panic.go:1117 +0x72
18011 runtime.sigpanic()
18012 /usr/local/go/src/runtime/signal_unix.go:718 +0x2e5
18013
18014 goroutine 77787 [syscall]:
18015 runtime.cgocall(0x23483a2, 0xc00076f698, 0x3c19600)
18016 /usr/local/go/src/runtime/cgocall.go:154 +0x5b fp=0xc00076f640 sp=0xc00076f608 pc=0x85265b
18017 github.com/milvus-io/milvus/internal/querynode._Cfunc_LoadFieldData(0x7f15935c5260, 0x64, 0xc004e4c000, 0xb8d5e, 0x4dd3, 0x0, 0x0)
18018 _cgo_gotypes.go:550 +0x57 fp=0xc00076f698 sp=0xc00076f640 pc=0x227acf7
18019 github.com/milvus-io/milvus/internal/querynode.(*Segment).segmentLoadFieldData.func1.1(0xc0026e09c0, 0x64, 0xc004e4c000, 0xb8d5e, 0x4dd3, 0xc003ca92b8, 0xc001359200)
18020 /go/src/github.com/milvus-io/milvus/internal/querynode/segment.go:771 +0x85 fp=0xc00076f6e8 sp=0xc00076f698 pc=0x229cbe5
18021 github.com/milvus-io/milvus/internal/querynode.(*Segment).segmentLoadFieldData.func1(0xc0007b8e50, 0xca5681c8360fc, 0xc00076f778, 0x85540b)
18022 /go/src/github.com/milvus-io/milvus/internal/querynode/segment.go:771 +0x5b fp=0xc00076f738 sp=0xc00076f6e8 pc=0x229cc7b
18023 github.com/milvus-io/milvus/internal/util/concurrency.(*Pool).Submit.func1()
18024 /go/src/github.com/milvus-io/milvus/internal/util/concurrency/pool.go:46 +0x59 fp=0xc00076f788 sp=0xc00076f738 pc=0x1e209d9
18025 github.com/panjf2000/ants/v2.(*goWorker).run.func1(0xc000433e30)
18026 /go/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/worker.go:70 +0x94 fp=0xc00076f7d8 sp=0xc00076f788 pc=0x1e20014
18027 runtime.goexit()
18028 /usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc00076f7e0 sp=0xc00076f7d8 pc=0x8c2101
18029 created by github.com/panjf2000/ants/v2.(*goWorker).run
18030 /go/pkg/mod/github.com/panjf2000/ants/v2@v2.4.8/worker.go:48 +0x4c
18031
18032 goroutine 1 [chan receive, 2678 minutes]:
18033 github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run(0xc000148080, 0xc000012000, 0x0, 0x0)
18034 /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:523 +0x47c
18035 github.com/milvus-io/milvus/cmd/milvus.(*run).execute(0xc0001442d0, 0xc000050090, 0x3, 0x3, 0xc0001421e0)
18036 /go/src/github.com/milvus-io/milvus/cmd/milvus/run.go:111 +0x496
18037 github.com/milvus-io/milvus/cmd/milvus.RunMilvus(0xc000050090, 0x3, 0x3)
18038 /go/src/github.com/milvus-io/milvus/cmd/milvus/milvus.go:60 +0x162
18039 main.main()
18040 /go/src/github.com/milvus-io/milvus/cmd/main.go:26 +0x45
Binlog meta is correct, looking into string index now
{\"segmentID\":436168328301772801,\"partitionID\":436166538015539201,\"collectionID\":436166537753657345,\"binlog_paths\":[
{\"fieldID\":105,\"binlogs\":[
{\"entries_num\":5071,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168328249606148,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/105/436168330176364550\",\"log_size\":41541639},
{\"entries_num\":5059,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168333885440005,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/105/436168336625106950\",\"log_size\":41443335},
{\"entries_num\":4937,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168339691929605,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/105/436168341487616006\",\"log_size\":40443911},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/105/436168347005747206\",\"log_size\":20774919},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/105/436168347477606406\",\"log_size\":20774919},
{\"entries_num\":13,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/105/436168505394200582\",\"log_size\":106503}]},
{"fieldID: 0", \"binlogs\":[
{\"entries_num\":5071,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168328249606148,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/0/436168330176364551\",\"log_size\":40576},
{\"entries_num\":5059,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168333885440005,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/0/436168336625106951\",\"log_size\":40480},
{\"entries_num\":4937,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168339691929605,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/0/436168341487616007\",\"log_size\":39504},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/0/436168347005747207\",\"log_size\":20296},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/0/436168347477606407\",\"log_size\":20296},
{\"entries_num\":13,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/0/436168505394200583\",\"log_size\":112}]},
{\"fieldID\":1,\"binlogs\":[
{\"entries_num\":5071,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168328249606148,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/1/436168330176364552\",\"log_size\":40576},
{\"entries_num\":5059,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168333885440005,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/1/436168336625106952\",\"log_size\":40480},
{\"entries_num\":4937,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168339691929605,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/1/436168341487616008\",\"log_size\":39504},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/1/436168347005747208\",\"log_size\":20296},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/1/436168347477606408\",\"log_size\":20296},
{\"entries_num\":13,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/1/436168505394200584\",\"log_size\":112}]},
{\"fieldID\":100,\"binlogs\":[
{\"entries_num\":5071,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168328249606148,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/100/436168330176364545\",\"log_size\":7},
{\"entries_num\":5059,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168333885440005,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/100/436168336625106945\",\"log_size\":7},
{\"entries_num\":4937,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168339691929605,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/100/436168341487616001\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/100/436168347005747201\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/100/436168347477606401\",\"log_size\":7},
{\"entries_num\":13,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/100/436168505394200577\",\"log_size\":7}]},
{\"fieldID\":101,\"binlogs\":[
{\"entries_num\":5071,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168328249606148,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/101/436168330176364546\",\"log_size\":7},
{\"entries_num\":5059,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168333885440005,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/101/436168336625106946\",\"log_size\":7},
{\"entries_num\":4937,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168339691929605,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/101/436168341487616002\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/101/436168347005747202\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/101/436168347477606402\",\"log_size\":7},
{\"entries_num\":13,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/101/436168505394200578\",\"log_size\":7}]},
{\"fieldID\":102,\"binlogs\":[
{\"entries_num\":5071,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168328249606148,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/102/436168330176364547\",\"log_size\":7},
{\"entries_num\":5059,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168333885440005,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/102/436168336625106947\",\"log_size\":7},
{\"entries_num\":4937,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168339691929605,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/102/436168341487616003\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/102/436168347005747203\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/102/436168347477606403\",\"log_size\":7},
{\"entries_num\":13,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/102/436168505394200579\",\"log_size\":7}]},
{\"fieldID\":103,\"binlogs\":[
{\"entries_num\":5071,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168328249606148,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/103/436168330176364548\",\"log_size\":7},
{\"entries_num\":5059,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168333885440005,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/103/436168336625106948\",\"log_size\":7},
{\"entries_num\":4937,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168339691929605,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/103/436168341487616004\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/103/436168347005747204\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/103/436168347477606404\",\"log_size\":7},
{\"entries_num\":13,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/103/436168505394200580\",\"log_size\":7}]},
{\"fieldID\":104,\"binlogs\":[
{\"entries_num\":5071,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168328249606148,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/104/436168330176364549\",\"log_size\":7},
{\"entries_num\":5059,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168333885440005,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/104/436168336625106949\",\"log_size\":7},
{\"entries_num\":4937,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168339691929605,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/104/436168341487616005\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/104/436168347005747205\",\"log_size\":7},
{\"entries_num\":2536,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/104/436168347477606405\",\"log_size\":7},
{\"entries_num\":13,\"timestamp_from\":18446744073709551615,\"timestamp_to\":436168345432883201,\"log_path\":\"file/insert_log/436166537753657345/436166538015539201/436168328301772801/104/436168505394200581\",\"log_size\":7}]}]
\"num_of_rows\":20152,
\"statslogs\":[
{\"fieldID\":100,\"binlogs\":[
{\"log_path\":\"file/stats_log/436166537753657345/436166538015539201/436168328301772801/100/436168505394200585\",\"log_size\":117960}]}],
\"deltalogs\":[{}],
\"index_infos\":[
{\"fieldID\":101,\"enable_index\":true,\"index_name\":\"patid\",\"indexID\":436167134655545345,\"buildID\":436168347110080518,
\"index_params\":[{\"key\":\"index_type\",\"value\":\"Trie\"}],
\"index_file_paths\":[
\"file/index_files/436168347110080518/1/436166538015539201/436168328301772801/indexParams\",
\"file/index_files/436168347110080518/1/436166538015539201/436168328301772801/marisa_trie_index\",
\"file/index_files/436168347110080518/1/436166538015539201/436168328301772801/marisa_trie_str_ids\"],\"index_size\":213992},
{\"fieldID\":102,\"enable_index\":true,\"index_name\":\"group\",\"indexID\":436167136032587777,\"buildID\":436168347110080520,
\"index_params\":[{\"key\":\"index_type\",\"value\":\"Trie\"}],
\"index_file_paths\":[
\"file/index_files/436168347110080520/1/436166538015539201/436168328301772801/indexParams\",
\"file/index_files/436168347110080520/1/436166538015539201/436168328301772801/marisa_trie_index\",
\"file/index_files/436168347110080520/1/436166538015539201/436168328301772801/marisa_trie_str_ids\"],\"index_size\":165248},
{\"fieldID\":103,\"enable_index\":true,\"index_name\":\"agentno\",\"indexID\":436167137670201345,\"buildID\":436168347110080522,
\"index_params\":[{\"key\":\"index_type\",\"value\":\"Trie\"}],
\"index_file_paths\":[
\"file/index_files/436168347110080522/1/436166538015539201/436168328301772801/indexParams\",
\"file/index_files/436168347110080522/1/436166538015539201/436168328301772801/marisa_trie_index\",
\"file/index_files/436168347110080522/1/436166538015539201/436168328301772801/marisa_trie_str_ids\"],\"index_size\":164544},
{\"fieldID\":104,\"enable_index\":true,\"index_name\":\"isopen\",\"indexID\":436167139203743745,\"buildID\":436168347123187714,
\"index_params\":[{\"key\":\"index_type\",\"value\":\"Trie\"}],
\"index_file_paths\":[
\"file/index_files/436168347123187714/1/436166538015539201/436168328301772801/indexParams\",
\"file/index_files/436168347123187714/1/436166538015539201/436168328301772801/marisa_trie_index\",
\"file/index_files/436168347123187714/1/436166538015539201/436168328301772801/marisa_trie_str_ids\"],\"index_size\":164544},
{\"fieldID\":105,\"enable_index\":true,\"index_name\":\"img\",\"indexID\":436167142075006977,\"buildID\":436168347123187716,
\"index_params\":[{\"key\":\"index_type\",\"value\":\"IVF_SQ8\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"params\",\"value\":\"{\\\"nlist\\\":\\\"2048\\\"}\"}],
\"index_file_paths\":[
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/indexParams\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_0\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_1\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_10\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_2\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_3\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_4\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_5\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_6\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_7\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_8\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/IVF_9\",
\"file/index_files/436168347123187716/1/436166538015539201/436168328301772801/SLICE_META\"],\"index_size\":45653614}],
\"segment_size\":46802472,\
"insert_channel\":\"by-dev-rootcoord-dml_13_436166537753657345v1\"}
Binlog meta is correct, looking into string index now
Did you find the problem now? @XuanYang-cn
@icodersky Could you use birdwatcher to get a copy of etcd so that I can exam the meta? https://github.com/milvus-io/birdwatcher
Milvus(by-dev): backup --component all
found 37 keys, at revision 533816, starting backup...
Backing up ... 100%(37/37)
backup etcd for prefix by-dev/meta done, stored in file: bw_etcd_ALL.220707-152246.bak.gz
Can't build the birdwatcher.
go install github.com/milvus-io/birdwatcher
go install also the same timeout. @XuanYang-cn
@icodersky Could you use birdwatcher to get a copy of etcd so that I can exam the meta? https://github.com/milvus-io/birdwatcher
Milvus(by-dev): backup --component all found 37 keys, at revision 533816, starting backup... Backing up ... 100%(37/37) backup etcd for prefix by-dev/meta done, stored in file: bw_etcd_ALL.220707-152246.bak.gz
@icodersky Did you create index before the insert done or after?
@icodersky Did you create index before the insert done or after?
Create index before the insert start.
@icodersky Do you still have logs before this operation? BTW did datanode/datacoord restart during insert?
By the way, before load collection, I had Changed the milvus from the previous configuration:
helm upgrade --set indexNode.replicas=20 --set dataNode.replicas=20 --set proxy.replicas=9 --set queryNode.replicas=30 --set attu.enabled=true hx-release milvus/milvus
to the current configuration:
helm upgrade --set indexNode.replicas=5 --set dataNode.replicas=5 --set proxy.replicas=9 --set queryNode.replicas=20 --set attu.enabled=true hx-release milvus/milvus
Create Index Data Paths are less than real data paths
Create index data Paths
SegmentID: 436181007276703745 State: Flushed
Index name: patid, Index build ID: 436181033831104514, state: Finished, numRows: 19369,
data paths: [
file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181011064422402
file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181020030009346
file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181028353081346
file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181034618322946]
Actual Data Paths:
Field 101: [
entries_num:4966 timestamp_from:18446744073709551615 timestamp_to:436181007276703747
log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181011064422402" log_size:7
entries_num:4965 timestamp_from:18446744073709551615 timestamp_to:436181017093996545
log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181020030009346" log_size:7
entries_num:5000 timestamp_from:18446744073709551615 timestamp_to:436181024643481604
log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181028353081346" log_size:7
entries_num:1902 timestamp_from:18446744073709551615 timestamp_to:436181033110994946
log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181034618322946" log_size:7
entries_num:2536 timestamp_from:18446744073709551615 timestamp_to:436181033110994946
log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181035129503746" log_size:7
entries_num:554 timestamp_from:18446744073709551615 timestamp_to:436181033110994946
log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181192219295746" log_size:7 ]
@icodersky Do you still have logs before this operation? BTW did datanode/datacoord restart during insert?
By the way, before load collection, I had Changed the milvus from the previous configuration:
helm upgrade --set indexNode.replicas=20 --set dataNode.replicas=20 --set proxy.replicas=9 --set queryNode.replicas=30 --set attu.enabled=true hx-release milvus/milvus
to the current configuration:helm upgrade --set indexNode.replicas=5 --set dataNode.replicas=5 --set proxy.replicas=9 --set queryNode.replicas=20 --set attu.enabled=true hx-release milvus/milvus
We don't have that time's logs now. It's been reinstalled several times. The datanode/datacoord didn't restart during insert.
Create Index Data Paths are less than real data paths
Create index data Paths
SegmentID: 436181007276703745 State: Flushed Index name: patid, Index build ID: 436181033831104514, state: Finished, numRows: 19369, data paths: [ file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181011064422402 file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181020030009346 file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181028353081346 file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181034618322946]
Actual Data Paths:
Field 101: [ entries_num:4966 timestamp_from:18446744073709551615 timestamp_to:436181007276703747 log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181011064422402" log_size:7 entries_num:4965 timestamp_from:18446744073709551615 timestamp_to:436181017093996545 log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181020030009346" log_size:7 entries_num:5000 timestamp_from:18446744073709551615 timestamp_to:436181024643481604 log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181028353081346" log_size:7 entries_num:1902 timestamp_from:18446744073709551615 timestamp_to:436181033110994946 log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181034618322946" log_size:7 entries_num:2536 timestamp_from:18446744073709551615 timestamp_to:436181033110994946 log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181035129503746" log_size:7 entries_num:554 timestamp_from:18446744073709551615 timestamp_to:436181033110994946 log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181192219295746" log_size:7 ]
What causes it? Is it because the data we inserted is abnormal?
segmentID: 436181007276703745 create time 2022-09-23 10:01:35.776 +0800 CST IndexBuildID: 436181033831104514 create time 2022-09-23 10:03:17.073 +0800 CST
IndexRequest And Binlog both have logIDs: logID: 436181011064422402 create time 2022-09-23 10:01:50.225 +0800 CST ts_to 2022-09-23 10:01:35.776 +0800 CST logID: 436181020030009346 create time 2022-09-23 10:02:24.426 +0800 CST ts_to 2022-09-23 10:02:13.226 +0800 CST logID: 436181028353081346 create time 2022-09-23 10:02:56.176 +0800 CST ts_to 2022-09-23 10:02:42.025 +0800 CST logID: 436181034618322946 create time 2022-09-23 10:03:20.076 +0800 CST ts_to 2022-09-23 10:03:14.326 +0800 CST
Only Binlog have: logID: 436181035129503746 create time 2022-09-23 10:03:22.026 +0800 CST ts_to 2022-09-23 10:03:14.326 +0800 CST logID: 436181192219295746 create time 2022-09-23 10:13:21.276 +0800 CST ts_to 2022-09-23 10:03:14.326 +0800 CST
@XuanYang-cn DataCoord updated its binlog path 10 minutes after IndexCoord queried the binlog path.
segmentID: 436181007276703745 create time 2022-09-23 10:01:35.776 +0800 CST IndexBuildID: 436181033831104514 create time 2022-09-23 10:03:17.073 +0800 CST
IndexRequest And Binlog both have logIDs: logID: 436181011064422402 create time 2022-09-23 10:01:50.225 +0800 CST ts_to 2022-09-23 10:01:35.776 +0800 CST logID: 436181020030009346 create time 2022-09-23 10:02:24.426 +0800 CST ts_to 2022-09-23 10:02:13.226 +0800 CST logID: 436181028353081346 create time 2022-09-23 10:02:56.176 +0800 CST ts_to 2022-09-23 10:02:42.025 +0800 CST logID: 436181034618322946 create time 2022-09-23 10:03:20.076 +0800 CST ts_to 2022-09-23 10:03:14.326 +0800 CST
Only Binlog have: logID: 436181035129503746 create time 2022-09-23 10:03:22.026 +0800 CST ts_to 2022-09-23 10:03:14.326 +0800 CST logID: 436181192219295746 create time 2022-09-23 10:13:21.276 +0800 CST ts_to 2022-09-23 10:03:14.326 +0800 CST
@XuanYang-cn DataCoord updated its binlog path 10 minutes after IndexCoord queried the binlog path.
The interesting part is logID 436181035129503746 and 436181192219295746 have same ts_to as 436181034618322946
The scale-down triggered the recovery logic.
If the recovery happens right when a segment is about to flush but not yet, DataCoord will send this flushSegment
request to the New DataNode, which just started and didn't finish consuming the data.
So data for the create-index is less than the original data.
This is a known bug and has been fixed by #19985 at the master branch
/unassign /assign @yanliang567
To be more clarify:
entries_num:1902 timestamp_from:18446744073709551615 timestamp_to:436181033110994946 log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181034618322946" log_size:7
DN is still consuming data from pulsar, consumes 2536 rows
entries_num:2536 timestamp_from:18446744073709551615 timestamp_to:436181033110994946 log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181035129503746" log_size:7
Now the flush of 3 succeeds and DN changes this segment to Flushed
DN is still consuming data from pulsar, consumes 554 rows, but the rows are not updated into segment replica(channel meta)
entries_num:554 timestamp_from:18446744073709551615 timestamp_to:436181033110994946 log_path:"file/insert_log/436166537753657345/436166538238623745/436181007276703745/101/436181192219295746" log_size:7 ]
Meta in DataCoord counts 554 rows less because all the 5th numRows info from DN is wrong.
/unassign
@czs007 so was it fixed
Is there an existing issue for this?
Environment
Current Behavior
Operating in the attu, load collection, still loading after 5 hours. seem load collection get stuck.
Had 9 partitions.
dimension: 2048 index: IVFSQ8 collection Entity Count: 31,411,515
Expected Behavior
No response
Steps To Reproduce
Milvus Log
hx-release-milvus-proxy-5b95cfbcf5-q66xc.log hx-release-milvus-querycoord-78896f65f6-w4jvl.log hx-release-milvus-querynode-847968b458-4q9vs.log
Anything else?
No response