zilliztech / milvus-backup

Backup and restore tool for Milvus
Apache License 2.0
133 stars 48 forks source link

[Bug]: An error occurred during the backup process for milvus: ["Fail to fill segment backup info"] #387

Open 372278663 opened 3 months ago

372278663 commented 3 months ago

Current Behavior

I wrote a script for automatic daily backup of Milvus. The key command is: ./milvus-backup create -n milvus_backup_$TIMESTAMP

I tried it in two environments and encountered the same problem of errors after several days of normal operation. Here are the relevant logs:

[2024/07/29 09:30:12.091 +08:00] [INFO] [core/backup_impl_create_backup.go:494] [backupCollectionExecute] [collectionMeta="id:\"186d75b8-4d4a-11ef-abe3-e61792940d17\" start_time:1722216610 collection_id:451333863845088053 db_name:\"default\" collection_name:\"kb000043\" schema:<name:\"kb000043\" fields:<fieldID:100 name:\"kb_id\" data_type:VarChar type_params:<key:\"max_length\" value:\"65535\" > > fields:<fieldID:101 name:\"src_id\" data_type:VarChar type_params:<key:\"max_length\" value:\"65535\" > > fields:<fieldID:102 name:\"src_nm\" data_type:VarChar type_params:<key:\"max_length\" value:\"65535\" > > fields:<fieldID:103 name:\"chunk_type\" data_type:VarChar type_params:<key:\"max_length\" value:\"65535\" > > fields:<fieldID:104 name:\"chunk_id\" data_type:Int64 > fields:<fieldID:105 name:\"answer\" data_type:VarChar type_params:<key:\"max_length\" value:\"65535\" > > fields:<fieldID:106 name:\"text\" data_type:VarChar type_params:<key:\"max_length\" value:\"65535\" > > fields:<fieldID:107 name:\"pk\" is_primary_key:true data_type:VarChar type_params:<key:\"max_length\" value:\"65535\" > > fields:<fieldID:108 name:\"vector\" data_type:FloatVector type_params:<key:\"dim\" value:\"1024\" > > > shards_num:1 consistency_level:Session backup_timestamp:451468751011840 has_index:true index_infos:<field_name:\"vector\" index_name:\"vector\" index_type:\"HNSW\" params:<key:\"index_type\" value:\"HNSW\" > params:<key:\"metric_type\" value:\"L2\" > params:<key:\"params\" value:\"{\\"M\\":8,\\"efConstruction\\":64}\" > > load_state:\"NotLoad\" backup_physical_timestamp:1722216610 "] [2024/07/29 09:30:12.091 +08:00] [INFO] [core/backup_impl_create_backup.go:501] ["Begin copy data"] [dbName=default] [collectionName=kb000043] [segmentNum=1] [2024/07/29 09:30:12.091 +08:00] [INFO] [core/backup_impl_create_backup.go:509] ["copy segment"] [collection_id=451333863845088053] [partition_id=451333863845088054] [segment_id=451333863845088140] [group_id=0] [2024/07/29 09:30:12.091 +08:00] [DEBUG] [core/backup_impl_create_backup.go:841] [insertPath] [bucket=a-bucket] [insertPath=files/insert_log/451333863845088053/451333863845088054/451333863845088140/] [2024/07/29 09:30:12.092 +08:00] [ERROR] [core/backup_impl_create_backup.go:509] ["Fail to fill segment backup info"] [collection_id=451333863845077768] [partition_id=451333863845077769] [segment_id=451333863845077825] [group_id=0] [error="Get empty input path, but segment should not be empty, files/insert_log/451333863845077768/451333863845077769/451333863845077825/"] [stack="github.com/zilliztech/milvus-backup/core.(BackupContext).backupCollectionExecute\n\t/home/runner/work/milvus-backup/milvus-backup/core/backup_impl_create_backup.go:509\ngithub.com/zilliztech/milvus-backup/core.(BackupContext).executeCreateBackup.func2\n\t/home/runner/work/milvus-backup/milvus-backup/core/backup_impl_create_backup.go:638\ngithub.com/zilliztech/milvus-backup/internal/common.(WorkerPool).work.func1\n\t/home/runner/work/milvus-backup/milvus-backup/internal/common/workerpool.go:70\ngolang.org/x/sync/errgroup.(Group).Go.func1\n\t/home/runner/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75"]

Expected Behavior

No response

Steps To Reproduce

No response

Environment

milvus standalone : 2.3.3
milvus-backup: 0.4.12

Anything else?

No response

wayblink commented 3 months ago

@372278663 Fail to fill segment backup info means the data is not found because of unexpected delete by someone or garbarge collected by milvus. Is your data very large. It suppose only happens when the data is very large, and it costs longer than the GC data tolerance. In recent version of milvus. We support pause GC during backup. However, it is not supported in 2.3.3

372278663 commented 3 months ago

@wayblink The etcd folder occupies 653MB of memory space. Is this considered a large amount of data? What‘s the proper size to generally control it at? After deleting all data in milvus and re-establishing the collections, backup can be made normally, but it's not sure when the problem will occur again.

372278663 commented 3 months ago

@wayblink Hello~ The issue occurred again yesterday, and the etcd folder occupies 700MB now. Is there any solution to this problem? The consumption is too high every time I drop and rebuild collections, and I need to continuously monitor whether the backup is successful.

wayblink commented 3 months ago

@wayblink Hello~ The issue occurred again yesterday, and the etcd folder occupies 700MB now. Is there any solution to this problem? The consumption is too high every time I drop and rebuild collections, and I need to continuously monitor whether the backup is successful.

etcd size is not a issue. It is related to your milvus data size. How many data are there in your milvus cluster. You can see it by attu

wayblink commented 3 months ago

@372278663 If you can provide the backup log and milvus log. We can look into it

372278663 commented 3 months ago

@wayblink Thank you for your reminder. Attu shows the approximate number of entities is 21,616 rows. How can I best provide you with the log files? Would it be alright if I sent them to you via email?

wayblink commented 3 months ago

@wayblink Thank you for your reminder. Attu shows the approximate number of entities is 21,616 rows. How can I best provide you with the log files? Would it be alright if I sent them to you via email?

That is quite a small data amount. Just upload the files here is OK.

372278663 commented 3 months ago

@wayblink Thank you for your reminder. Attu shows the approximate number of entities is 21,616 rows. How can I best provide you with the log files? Would it be alright if I sent them to you via email?

That is quite a small data amount. Just upload the files here is OK.

Due to my company's data security requirements, I cannot upload attachments directly on github. Could you please provide me with your email address so that I can send you the files? Thank you so much.

wayblink commented 2 months ago

@wayblink Thank you for your reminder. Attu shows the approximate number of entities is 21,616 rows. How can I best provide you with the log files? Would it be alright if I sent them to you via email?

That is quite a small data amount. Just upload the files here is OK.

Due to my company's data security requirements, I cannot upload attachments directly on github. Could you please provide me with your email address so that I can send you the files? Thank you so much.

sorry for missing the reply. wayasxxx@gmail.com

372278663 commented 2 months ago

@wayblink Thank you for your reminder. Attu shows the approximate number of entities is 21,616 rows. How can I best provide you with the log files? Would it be alright if I sent them to you via email?

That is quite a small data amount. Just upload the files here is OK.

Due to my company's data security requirements, I cannot upload attachments directly on github. Could you please provide me with your email address so that I can send you the files? Thank you so much.

sorry for missing the reply. wayasxxx@gmail.com

It's okay, don't mind it. I just sent you an email, please check it.