zilliztech / milvus-backup

Backup and restore tool for Milvus
Apache License 2.0
111 stars 38 forks source link

[Bug]: No binlog on backup #183

Closed jackfischer closed 1 month ago

jackfischer commented 10 months ago

Current Behavior

Perform backup but only metadata persisted.

Expected Behavior

Expect binlogs to be persisted in the backup.

Steps To Reproduce

Run a normal backup with bucket name and rootPath: "" and receive only metadata persisted.

Files are present in S3, as recommended to check in https://github.com/milvus-io/milvus/issues/26088

$ aws s3 ls s3://credal-milvus --recursive
2023-08-18 00:18:02        772 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844024932899
2023-08-18 00:28:02       2106 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844024935992
2023-08-18 08:24:26        828 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025084219
2023-08-18 08:34:26        964 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025087326
2023-08-18 09:16:08       1071 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025100587
2023-08-18 09:26:09        808 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025103680
2023-08-18 09:36:08       1206 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025106785
2023-08-18 10:37:08        771 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025126663
2023-08-18 14:15:15        801 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025195465
2023-08-18 14:25:15        802 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025198550
2023-08-18 15:18:09        772 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025215965
2023-08-18 15:28:10        962 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025219077
2023-08-18 15:38:10        874 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025223440
2023-08-18 15:48:10       1008 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025226565
2023-08-18 17:16:06        827 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025254100
2023-08-18 18:16:05        802 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025272860
2023-08-18 19:15:14        851 delta_log/442270139975731376/442270139975731377/443242844024919624/443242844025291482
...
minio:
...
  bucketName: "credal-milvus" # Milvus Bucket name in MinIO/S3, make it the same as your milvus instance
  rootPath: "" # Milvus storage root path in MinIO/S3, make it the same as your milvus instance
$ ./milvus-backup create -n prod26aug23
[2023/08/26 15:27:25.383 -04:00] [INFO] [logutil/logutil.go:165] ["Log directory"] [configDir=]
[2023/08/26 15:27:25.383 -04:00] [INFO] [logutil/logutil.go:166] ["Set log file to "] [path=logs/backup.log]
[2023/08/26 15:27:25.383 -04:00] [INFO] [core/backup_impl_create_backup.go:25] ["receive CreateBackupRequest"] [requestId=9639db2a-4446-11ee-b448-be531324a3fb] [backupName=prod26aug23] [collections="[]"] [databaseCollections=] [async=false]
[2023/08/26 15:27:25.385 -04:00] [INFO] [storage/minio_chunk_manager.go:124] ["minio chunk manager init success."] [bucketname=credal-milvus] [root=]
[2023/08/26 15:27:25.945 -04:00] [INFO] [core/backup_impl_create_backup.go:248] ["collections to backup"] [collections="[activity_source_embeddings_v2]"]
[2023/08/26 15:27:26.111 -04:00] [INFO] [core/backup_impl_create_backup.go:285] ["try to get index"] [collection_name=activity_source_embeddings_v2]
[2023/08/26 15:27:26.396 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=organizationId] ["index info"="[{}]"]
[2023/08/26 15:27:26.720 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=chunkId] ["index info"="[{}]"]
[2023/08/26 15:27:27.135 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=chunkIndex] ["index info"="[{}]"]
[2023/08/26 15:27:27.508 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=totalChunks] ["index info"="[{}]"]
[2023/08/26 15:27:27.814 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=sourceId] ["index info"="[{}]"]
[2023/08/26 15:27:28.093 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=sourceName] ["index info"="[{}]"]
[2023/08/26 15:27:28.396 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=sourceUrl] ["index info"="[{}]"]
[2023/08/26 15:27:28.681 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=createdTimestamp] ["index info"="[{}]"]
[2023/08/26 15:27:28.957 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=lastUpdatedTimestamp] ["index info"="[{}]"]
[2023/08/26 15:27:29.250 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=sourceSystemId] ["index info"="[{}]"]
[2023/08/26 15:27:29.538 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=embedding] ["index info"="[{}]"]
[2023/08/26 15:27:29.841 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=content] ["index info"="[{}]"]
[2023/08/26 15:27:30.142 -04:00] [INFO] [core/backup_impl_create_backup.go:305] ["field index"] [collection_name=activity_source_embeddings_v2] [field_name=sourceType] ["index info"="[{}]"]
[2023/08/26 15:27:30.602 -04:00] [INFO] [core/backup_impl_create_backup.go:398] ["GetPersistentSegmentInfo before flush from milvus"] [collectionName=activity_source_embeddings_v2] [segmentNumBeforeFlush=51]
[2023/08/26 15:27:31.959 -04:00] [INFO] [core/backup_impl_create_backup.go:407] ["flush segments"] [collectionName=activity_source_embeddings_v2] [newSealedSegmentIDs="[443721547360593092]"] [flushedSegmentIDs="[443721547359283543,443242844025317613,443242844025989393,443242844026596494,443721547358423501,443721547359209701,443721547360254884,443721547360520437,443242844026478748,443721547358927799,443721547360246205,443721547360461117,443721547358690132,443242844026507367,443721547358807532,443242844026568991,443242844026599760,443721547359118824,443721547359345936,443721547359740322,443721547360483097,443242844026468714,443721547360588726,443242844024919624,443242844026509846,443242844026560823,443721547358727940,443721547359574242,443721547360018070,443242844026599761,443721547360588160,443242844026555931,443721547359680013,443721547360503196,443721547360541610,443721547360593118,443721547360188469,443721547360235255,443242844026012859,443721547359092784,443721547359801992,443242844026600302,443721547358597420,443721547358771404,443242844025783385,443242844025788774,443242844026528577,443242844026577634,443242844026620153,443721547358927800,443721547359763518]"] [timeOfSeal=1693078050]
[2023/08/26 15:27:32.104 -04:00] [INFO] [core/backup_impl_create_backup.go:420] ["GetPersistentSegmentInfo after flush from milvus"] [collectionName=activity_source_embeddings_v2] [segmentNumBeforeFlush=51] [segmentNumAfterFlush=52]
[2023/08/26 15:27:32.105 -04:00] [INFO] [core/backup_impl_create_backup.go:457] ["Finished fill segment"] [collectionName=activity_source_embeddings_v2]
[2023/08/26 15:27:32.109 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026507367]
[2023/08/26 15:27:32.111 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547358807532]
[2023/08/26 15:27:32.113 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026568991]
[2023/08/26 15:27:32.115 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026599760]
[2023/08/26 15:27:32.117 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359118824]
[2023/08/26 15:27:32.119 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359345936]
[2023/08/26 15:27:32.121 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359740322]
[2023/08/26 15:27:32.123 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360483097]
[2023/08/26 15:27:32.125 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026468714]
[2023/08/26 15:27:32.126 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360588726]
[2023/08/26 15:27:32.127 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844024919624]
[2023/08/26 15:27:32.129 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026509846]
[2023/08/26 15:27:32.130 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026560823]
[2023/08/26 15:27:32.131 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547358727940]
[2023/08/26 15:27:32.132 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359574242]
[2023/08/26 15:27:32.133 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360018070]
[2023/08/26 15:27:32.134 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026599761]
[2023/08/26 15:27:32.135 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360588160]
[2023/08/26 15:27:32.136 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026555931]
[2023/08/26 15:27:32.137 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359680013]
[2023/08/26 15:27:32.138 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360503196]
[2023/08/26 15:27:32.139 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360541610]
[2023/08/26 15:27:32.140 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360593118]
[2023/08/26 15:27:32.140 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360188469]
[2023/08/26 15:27:32.141 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360235255]
[2023/08/26 15:27:32.142 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026012859]
[2023/08/26 15:27:32.143 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359092784]
[2023/08/26 15:27:32.144 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359801992]
[2023/08/26 15:27:32.145 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026600302]
[2023/08/26 15:27:32.145 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547358597420]
[2023/08/26 15:27:32.146 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547358771404]
[2023/08/26 15:27:32.147 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844025783385]
[2023/08/26 15:27:32.148 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844025788774]
[2023/08/26 15:27:32.148 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026528577]
[2023/08/26 15:27:32.149 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026577634]
[2023/08/26 15:27:32.150 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026620153]
[2023/08/26 15:27:32.151 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547358927800]
[2023/08/26 15:27:32.151 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359763518]
[2023/08/26 15:27:32.152 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359283543]
[2023/08/26 15:27:32.153 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844025317613]
[2023/08/26 15:27:32.154 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844025989393]
[2023/08/26 15:27:32.155 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026596494]
[2023/08/26 15:27:32.155 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547358423501]
[2023/08/26 15:27:32.156 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547359209701]
[2023/08/26 15:27:32.157 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360593092]
[2023/08/26 15:27:32.157 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360254884]
[2023/08/26 15:27:32.158 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360520437]
[2023/08/26 15:27:32.159 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443242844026478748]
[2023/08/26 15:27:32.159 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547358927799]
[2023/08/26 15:27:32.160 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360246205]
[2023/08/26 15:27:32.161 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547360461117]
[2023/08/26 15:27:32.161 -04:00] [WARN] [core/backup_impl_create_backup.go:468] ["this segment has no insert binlog"] [id=443721547358690132]
[2023/08/26 15:27:32.161 -04:00] [INFO] [core/backup_impl_create_backup.go:474] ["readSegmentInfo from storage"] [collectionName=activity_source_embeddings_v2] [segmentNum=52]
[2023/08/26 15:27:32.161 -04:00] [INFO] [core/backup_impl_create_backup.go:506] ["finish build partition info"] [collectionName=activity_source_embeddings_v2] [partitionNum=1]
[2023/08/26 15:27:32.161 -04:00] [INFO] [core/backup_impl_create_backup.go:510] ["Begin copy data"] [collectionName=activity_source_embeddings_v2] [segmentNum=52]
[2023/08/26 15:27:32.161 -04:00] [INFO] [core/backup_impl_create_backup.go:540] ["partition size is smaller than MaxSegmentGroupSize, won't separate segments into groups in backup files"] [collectionId=442270139975731376] [partitionId=442270139975731377] [partitionSize=0] [MaxSegmentGroupSize=2147483648]
[2023/08/26 15:27:32.166 -04:00] [INFO] [core/backup_impl_create_backup.go:579] ["finish executeCreateBackup"] [requestId=9639db2a-4446-11ee-b448-be531324a3fb] [backupName=prod26aug23] [collections="[]"] [async=false] ["backup meta"="{\"id\":\"9639db2a-4446-11ee-b448-be531324a3fb\",\"state_code\":2,\"start_time\":1693078045703,\"end_time\":1693078052161,\"name\":\"prod26aug23\",\"backup_timestamp\":1693078045703,\"size\":0,\"milvus_version\":\"v2.2.11\"}"]

Only the following files are written

./credal-milvus
./credal-milvus/prod26aug23
./credal-milvus/prod26aug23/meta
./credal-milvus/prod26aug23/meta/segment_meta.json
./credal-milvus/prod26aug23/meta/segment_meta.json/xl.meta
./credal-milvus/prod26aug23/meta/backup_meta.json
./credal-milvus/prod26aug23/meta/backup_meta.json/xl.meta
./credal-milvus/prod26aug23/meta/collection_meta.json
./credal-milvus/prod26aug23/meta/collection_meta.json/xl.meta
./credal-milvus/prod26aug23/meta/partition_meta.json
./credal-milvus/prod26aug23/meta/partition_meta.json/xl.meta
./credal-milvus/prod26aug23/meta/full_meta.json
./credal-milvus/prod26aug23/meta/full_meta.json/xl.meta

Environment

No response

Anything else?

This is mentioned as a possibility in two places, https://github.com/zilliztech/milvus-backup/issues/47 in which it's suggested this should be treated as a failure, with one comment saying they are experiencing this behavior, and https://github.com/milvus-io/milvus/issues/26088 , false positive due to typo in rootPath (milvus_files vs milvus-files) but this is not the case for us.

zhuwenxing commented 10 months ago

@jackfischer When you run aws s3 ls s3://credal-milvus --recursive, and grep with segment id 443242844026507367, can you see insert_log in the output?

https://github.com/zilliztech/milvus-backup/blob/9afbc4ea1890b0ab9770364aea894fbdfbee2676/core/backup_impl_create_backup.go#L595-L597 The backup works by finding the file path of the original segment and copying it to the backup path. The problem is that the file path of the segment is not found.

By the way, you can also attach the debug log here for more detailed info. You can get it by setting log level to debug https://github.com/zilliztech/milvus-backup/blob/9afbc4ea1890b0ab9770364aea894fbdfbee2676/configs/backup.yaml#L3

jackfischer commented 10 months ago

We do indeed get insert logs, here is a sample of the output,

2023-08-21 11:37:55   44662021 insert_log/442270139975731376/442270139975731377/443242844026507367/110/443242844026507728
2023-08-21 11:37:58   33396311 insert_log/442270139975731376/442270139975731377/443242844026507367/110/443242844026507762
2023-08-21 11:37:11     588426 insert_log/442270139975731376/442270139975731377/443242844026507367/111/443242844026507400
2023-08-21 11:37:19     655322 insert_log/442270139975731376/442270139975731377/443242844026507367/111/443242844026507456
2023-08-21 11:37:25     730703 insert_log/442270139975731376/442270139975731377/443242844026507367/111/443242844026507501

And attaching debug level log, debug.log

zhuwenxing commented 10 months ago

/assign @wayblink PTAL

zhuwenxing commented 10 months ago

@jackfischer Thanks for your feedback. Not sure which version of the backup tool you are using. Can you try with the latest milvus-backup? It fixes some bugs about empty rootPath.

jackfischer commented 10 months ago

Hey we're on 9afbc4ea1890b0ab9770364aea894fbdfbee2676, just beyond the latest, 0.3.2.

wayblink commented 10 months ago

@jackfischer We found a bug about config, your case may also caused by this. Please use the latest 0.3.3 version.

jackfischer commented 10 months ago

Unfortunately same effect on f2515c9e1 0.3.3. Were there any other steps I need to take?

git pull
go build

backup.log

wayblink commented 10 months ago

@jackfischer Hi, seems the log is not complete or the backup progress is not done. meta file generate in the end, after all data is copied. Did you terminate the progress manually? Please wait to end and see the result

jackfischer commented 10 months ago

Thanks you. That is the normal end of the log file. Waited for it to finish, did not kill the process or anything like that.

wayblink commented 10 months ago

Thanks you. That is the normal end of the log file. Waited for it to finish, did not kill the process or anything like that.

Well, that it quite strange. I have no idea what happened. The log shows backup working good. It should be some error or panic if it failed...

wayblink commented 10 months ago

Thanks you. That is the normal end of the log file. Waited for it to finish, did not kill the process or anything like that.

Well, that it quite strange. I have no idea what happened. The log shows backup working good. It should be some error or panic if it failed...

jackfischer commented 10 months ago

What other info can we provide? I can audit the JSON files that do land for proprietary information and send those over? Helpful to get on a screen share?

wayblink commented 10 months ago

@jackfischer hi, I think it is an issue about empty rootPath. please try revert this pr #177 and retry

jackfischer commented 10 months ago

Hello, have reverted #177 on top of #200 but have the same result. Logs attached. To confirm, this is correct to test?

go build && ./milvus-backup create -n prod11sep_4

backup.log

tanvlt commented 6 months ago

hello, is there anyone resolve this issue? it seems the same as on my side, after creating a backup the content looks like: meta/backup_meta.json/xl.meta i was expecting there should be only meta/backup_meta.json file.

wayblink commented 5 months ago

hello, is there anyone resolve this issue? it seems the same as on my side, after creating a backup the content looks like: meta/backup_meta.json/xl.meta i was expecting there should be only meta/backup_meta.json file.

I think it is related to the storage system you use. What is your storage system.

tanvlt commented 4 months ago

hello, is there anyone resolve this issue? it seems the same as on my side, after creating a backup the content looks like: meta/backup_meta.json/xl.meta i was expecting there should be only meta/backup_meta.json file.

I think it is related to the storage system you use. What is your storage system.

Hi, i resolved this issue by using minio command to sync the back to the local, instead of using "cp" command.