zilliztech / milvus-backup

Backup and restore tool for Milvus
Apache License 2.0
111 stars 38 forks source link

Fix enabling both drop_exist_collection and skip_create_collection #348

Closed syang1997 closed 1 month ago

syang1997 commented 1 month ago

If skip_create_collection is not enabled, the collection will be validated before restoration. Therefore, when using drop_exist_collection, skip_create_collection must be enabled first. However, enabling both skip_create_collection and drop_exist_collection will result in the collection being deleted without being correctly recreated afterwards.

sre-ci-robot commented 1 month ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: syang1997

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/zilliztech/milvus-backup/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
mergify[bot] commented 1 month ago

@syang1997 Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/main/CONTRIBUTING.md#developer-certificate-of-origin-dco.

mergify[bot] commented 1 month ago

@syang1997 Please associate the related issue to the body of your Pull Request. (eg. “issue: #”)

wayblink commented 1 month ago

@syang1997 Hi, thanks for contribute. skip_create_collection and drop_exist_collection is designed not to use at the same time. In your pr, drop_exist_collection has a higher priority than skip_create_collection. This avoid the unexpected situation you mentioned. I prefer to add a parameter check, refuse the request if user set skip_create_collection and drop_exist_collection both true.

syang1997 commented 1 month ago

@syang1997 Hi, thanks for contribute. skip_create_collection and drop_exist_collection is designed not to use at the same time. In your pr, drop_exist_collection has a higher priority than skip_create_collection. This avoid the unexpected situation you mentioned. I prefer to add a parameter check, refuse the request if user set skip_create_collection and drop_exist_collection both true.

However, the current situation is that drop_exist_collection is not taking effect. In the scenario where a collection already exists in the cluster, setting drop_exist_collection to true and skip_create_collection to false, the restoration process fails due to the code snippet at (https://github.com/zilliztech/milvus-backup/blob/v0.4.13/core/backup_impl_restore_backup.go#L263) not passing the check. Therefore, in practice, drop_exist_collection has no actual effect.

syang1997 commented 1 month ago

@wayblink Therefore, I modified the code in this way to make both drop_exist_collection and skip_create_collection take effect simultaneously. This ensures that the collection is deleted while also preventing the issue of the collection not being recreated after deletion due to skip_create_collection.

syang1997 commented 1 month ago

@syang1997 Hi, thanks for contribute. skip_create_collection and drop_exist_collection is designed not to use at the same time. In your pr, drop_exist_collection has a higher priority than skip_create_collection. This avoid the unexpected situation you mentioned. I prefer to add a parameter check, refuse the request if user set skip_create_collection and drop_exist_collection both true.

@wayblink If it is required to enable drop_exist_collection and skip_create_collection independently, then the condition where drop_exist_collection is true should be assigned the capability of skip_create_collection (collection existence not being an error, and collection creation taking place after the deletion process). In this case, it becomes similar to setting skip_create_collection to true as well.

syang1997 commented 1 month ago

In the code flow, there is a check controlled by skip_create_collection before the deletion operation of drop_exist_collection, and a collection creation controlled by skip_create_collection after the deletion. I understand that the correct functionality of drop_exist_collection should be: if a collection exists in the cluster before the restoration, it will first be deleted, and then a new one will be created before proceeding with the restoration process. With the current design, it is impossible to achieve the envisioned scenario where drop_exist_collection and skip_create_collection are not enabled simultaneously.

wayblink commented 1 month ago

@syang1997 OK, make sense.