milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.27k stars 2.91k forks source link

[Bug]: The deleted collection reappears in the cluster and cannot be deleted. Therefore, the cluster collection cannot be used #33608

Open huangpeng0817 opened 5 months ago

huangpeng0817 commented 5 months ago

Is there an existing issue for this?

Environment

- Milvus version:2.3.12
- attu version:2.3.9 
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):  kafka  
- OS(Ubuntu or CentOS): CentOS

Current Behavior

The previous version was 2.2.8, recently upgraded to 2.3.12, recently restarted milvus found that there were many previously deleted collection names, and some were rebuilt after deletion. Currently, two collections with the same name can be seen on the cluster, and when I view the collections with the same name on attu, one of them is not in a state, and an error is displayed in it. However, partition data can be displayed. If the operation deletes this stateless collection, the stateful collection will be deleted. The stateless collection with the same name will appear again after being deleted and cannot be deleted 图片 图片 图片

View the collection data of the same name after connecting using Birdwatcher below

================================================================================

Milvus(by-dev) > show collection-history --id 448985828275718906

DBID: 1 Collection ID: 448985828275718906 Collection Name: zt_model_beta_1 Collection State: CollectionCreated Create Time: 2024-04-19 18:29:46 Fields:

================================================================================ DBID: 0 Collection ID: 447397465447990697 Collection Name: zt_model_dev Collection State: CollectionCreated Create Time: 2024-01-31 15:29:26 Fields: Enable Dynamic Schema: false Consistency Level: Strong Start position for channel by-dev-rootcoord-dml_3(by-dev-rootcoord-dml_3_447397465447990697v0): [83 116 36 5 0 0 0 0] Start position for channel by-dev-rootcoord-dml_4(by-dev-rootcoord-dml_4_447397465447990697v1): [50 146 224 4 0 0 0 0] Start position for channel by-dev-rootcoord-dml_5(by-dev-rootcoord-dml_5_447397465447990697v2): [98 247 230 4 0 0 0 0] Collection properties(0):================================================================================ zt_model_dev The collection was found to be deleted after this problem occurred 图片 Operation to delete the collection zt_model_dev 图片 图片

Expected Behavior

Deleted collections should not appear in the cluster

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

yanliang567 commented 5 months ago

@huangpeng0817 Could you please attach the etcd backup for investigation? Check this: https://github.com/milvus-io/birdwatcher for details about how to backup etcd with birdwatcher /assign @huangpeng0817

/assign @chyezh it sounds like a similar issue to #31306, @chyezh could ypu please help to take a look

huangpeng0817 commented 5 months ago

@huangpeng0817 能否附上etcd备份进行调查?请查看: https://github.com/milvus-io/birdwatcher 了解如何使用birdwatcher备份etcd的详细信息 /分配 @huangpeng0817

/分配 @chyezh 类似的问题 这听起来像是与#31306 , @chyezh ypu可以帮忙看一下吗

Because the deployment is using an external etcd, running birdwatcher on another machine using backup, it will eventually connect to milvus' node connection, but using k8s' internal network connection, running birdwatcher cannot connect to milvus' k8s node. I don't know if there is any problem with the collection, please contact me if there is any problem

The following information is collected by running backup

bw_etcd_ALL.240605-095247.bak.gz

congqixia commented 5 months ago

this shall be a bug for handling legacy collection created in older version(before database introduce) I'll provided a fix command in birdwatcher and fix this bug in next release. Thanks for letting us know! @huangpeng0817

congqixia commented 5 months ago

/assign

huangpeng0817 commented 5 months ago

this shall be a bug for handling legacy collection created in older version(before database introduce) I'll provided a fix command in birdwatcher and fix this bug in next release. Thanks for letting us know! @huangpeng0817

@congqixia After the repair command in birdwatcher is available, please post it to me, I will run it in our cluster and have a look

congqixia commented 5 months ago

this shall be a bug for handling legacy collection created in older version(before database introduce) I'll provided a fix command in birdwatcher and fix this bug in next release. Thanks for letting us know! @huangpeng0817

@congqixia After the repair command in birdwatcher is available, please post it to me, I will run it in our cluster and have a look

sure, working on it

congqixia commented 4 months ago

@huangpeng0817 you could use latest code in v1.0.x branch https://github.com/milvus-io/birdwatcher/tree/v1.0.x

huangpeng0817 commented 4 months ago

@huangpeng0817 you could use latest code in v1.0.x branch https://github.com/milvus-io/birdwatcher/tree/v1.0.x

@congqixia At present, there is no environment to compile this branch by itself, may I ask when the compiled program will be released? In addition, how to operate the repair command? Will this bug be fixed in 2.3.x

congqixia commented 4 months ago

@huangpeng0817

there is no environment to compile this branch by itself

ok, I'll release birdwatcher with executable asap

how to operate the repair command?

connect --etcd etcdip:port --rootPath
# always backup before fix
backup
# after connect to your instance
repair legacy-collection-remnant 
# dry run, this command will scan the meta to find any collection having this problem
# if you need to remove the collection remnant, run following command
repair legacy-collection-remnant --run
# and you need to restart rootcoord before this fix take effect

Will this bug be fixed in 2.3.x

the patch submit to 2.3 branch shall fix this bug