zilliztech / milvus-migration

20 stars 6 forks source link

[Bug]: new collection data rows mismatch with the source after the migration #107

Open SanhongWong opened 1 month ago

SanhongWong commented 1 month ago

Current Behavior

When I finished the migration using the workmode, I found the new collection data rows mismatch with the source collection.Is this a bug or just I made something wrong?

Expected Behavior

No response

Steps To Reproduce

1. Using the milvus-migration binary file compiled with the master code
2. with this config: 
dumper:
  worker:
    workMode: milvus2x      # work mode:milvus2x->milvus2x
    reader:
      bufferSize: 2000       # Read source data rows in each time read from Source Milvus.

meta:                       # meta part
  mode: config              # 'config' mode means will get meta config from this config file itself.
  version: 2.3            #  Source Milvus version
  #collection: audio # migrate data from this source collection
  #collection: video # migrate data from this source collection
  collection: image # migrate data from this source collection

source:                     # source milvus connection info
  milvus2x:
    endpoint: 10.218.xx.xxx:19530
    #database: default 
    #username: xxxx
    #password: xxxx

target:                    # target milvus collection info
  milvus2x:
    endpoint: 10.219.xx.xxx:19530
    #writeMode: upsert
    #username: xxxx
    #password: xxxxx
3. When I finish the migration, I see those logs in the console.From the console, I can the "LoadTotalSize" and "LoadFinishSize" is different.Here is the console:
[2024/10/17 09:53:44.579 +08:00] [INFO] [dbclient/cus_field_milvus2x.go:171] ["[Loader] success to BatchInsert to Milvus"] [col=image] [partition=_default]
[2024/10/17 09:53:44.580 +08:00] [INFO] [migration/milvus2x_starter.go:79] ["=================>JobProcess!!"] [Percent=98]
[2024/10/17 09:53:44.580 +08:00] [INFO] [migration/milvus2x_starter.go:27] ["[Starter] migration Milvus2x to Milvus2x finish!!!"] [Cost=0.280845449]
[2024/10/17 09:53:44.580 +08:00] [INFO] [starter/starter.go:118] ["[Starter] Migration Success!"] [Cost=0.28098321]
[2024/10/17 09:53:44.580 +08:00] [INFO] [cleaner/none_cleaner.go:17] ["[None Cleaner] not need clean files"] [mode=]
[2024/10/17 09:53:44.580 +08:00] [INFO] [cmd/start.go:32] ["[Cleaner] clean file success!"]
Migration Success! Job starttMvIVzhoPIld9HLG62HeV cost=[0.291483]
Migration JobInfo: {"jobId":"starttMvIVzhoPIld9HLG62HeV","jobStatus":"success","jobProcess":0,"msg":"","totalTasks":1,"finishTasks":1}
Migration ProcessInfo: {"DumpFinish":true,"DumpTotalSize":1251,"DumpFinishSize":1206,"LoadFinish":true,"LoadTotalFiles":1,"LoadUnFinishFiles":0,"LoadTotalSize":1251,"LoadFinishSize":1206}, Process:100
Migration FileTaskInfo:  null

Environment

The source milvus version is 2.3.2, and the target milvus version is 2.3.21.

Anything else?

  1. The source connection image
  2. The target connection image
wenhuiZilliz commented 5 days ago

Probably your collection have the same primary key record, We will remove duplicate PK log: "LoadTotalSize":1251, - mean total rows count "LoadFinishSize":1206 -after remove duplicate PK rows size