milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
28.16k stars 2.71k forks source link

[Bug]: Bulkinsert may lose deletion data #29122

Open jaime0815 opened 7 months ago

jaime0815 commented 7 months ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2 
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

  1. launch insert and delete requests to generate test data.
  2. stop to write the collection
  3. create a backup with Milvus backup tool
  4. restore the backup to a new cluster

We saw a loss of data occurs in step 3 and step 4, backup data lost some delta logs and bulkinsert skipped some delta logs.

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

jaime0815 commented 7 months ago

Strictly speaking, we should disable auto compaction before starting the backup, it also might lead to inconsistent data between the backup data and the original data.

wayblink commented 7 months ago

cc @zhuwenxing Let‘s make a case to verify this.

wayblink commented 7 months ago

https://github.com/zilliztech/milvus-backup/pull/257 one catch and fix

jaime0815 commented 6 months ago

https://github.com/milvus-io/milvus/issues/29162

xiaofan-luan commented 6 months ago

so this might due to the loss of delta log while flush all right?

yanliang567 commented 6 months ago

I think we have verified the fix. @jaime0815 please help to double confirm

stale[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.