Open huangpeng0817 opened 5 months ago
@huangpeng0817 Could you please attach the etcd backup for investigation? Check this: https://github.com/milvus-io/birdwatcher for details about how to backup etcd with birdwatcher /assign @huangpeng0817
/assign @chyezh it sounds like a similar issue to #31306, @chyezh could ypu please help to take a look
@huangpeng0817 能否附上etcd备份进行调查?请查看: https://github.com/milvus-io/birdwatcher 了解如何使用birdwatcher备份etcd的详细信息 /分配 @huangpeng0817
/分配 @chyezh 类似的问题 这听起来像是与#31306 , @chyezh ypu可以帮忙看一下吗
Because the deployment is using an external etcd, running birdwatcher on another machine using backup, it will eventually connect to milvus' node connection, but using k8s' internal network connection, running birdwatcher cannot connect to milvus' k8s node. I don't know if there is any problem with the collection, please contact me if there is any problem
The following information is collected by running backup
this shall be a bug for handling legacy collection created in older version(before database introduce) I'll provided a fix command in birdwatcher and fix this bug in next release. Thanks for letting us know! @huangpeng0817
/assign
this shall be a bug for handling legacy collection created in older version(before database introduce) I'll provided a fix command in birdwatcher and fix this bug in next release. Thanks for letting us know! @huangpeng0817
@congqixia After the repair command in birdwatcher is available, please post it to me, I will run it in our cluster and have a look
this shall be a bug for handling legacy collection created in older version(before database introduce) I'll provided a fix command in birdwatcher and fix this bug in next release. Thanks for letting us know! @huangpeng0817
@congqixia After the repair command in birdwatcher is available, please post it to me, I will run it in our cluster and have a look
sure, working on it
@huangpeng0817 you could use latest code in v1.0.x branch https://github.com/milvus-io/birdwatcher/tree/v1.0.x
@huangpeng0817 you could use latest code in v1.0.x branch https://github.com/milvus-io/birdwatcher/tree/v1.0.x
@congqixia At present, there is no environment to compile this branch by itself, may I ask when the compiled program will be released? In addition, how to operate the repair command? Will this bug be fixed in 2.3.x
@huangpeng0817
there is no environment to compile this branch by itself
ok, I'll release birdwatcher with executable asap
how to operate the repair command?
connect --etcd etcdip:port --rootPath
# always backup before fix
backup
# after connect to your instance
repair legacy-collection-remnant
# dry run, this command will scan the meta to find any collection having this problem
# if you need to remove the collection remnant, run following command
repair legacy-collection-remnant --run
# and you need to restart rootcoord before this fix take effect
Will this bug be fixed in 2.3.x
the patch submit to 2.3 branch shall fix this bug
Is there an existing issue for this?
Environment
Current Behavior
The previous version was 2.2.8, recently upgraded to 2.3.12, recently restarted milvus found that there were many previously deleted collection names, and some were rebuilt after deletion. Currently, two collections with the same name can be seen on the cluster, and when I view the collections with the same name on attu, one of them is not in a state, and an error is displayed in it. However, partition data can be displayed. If the operation deletes this stateless collection, the stateful collection will be deleted. The stateless collection with the same name will appear again after being deleted and cannot be deleted
View the collection data of the same name after connecting using Birdwatcher below
================================================================================
Milvus(by-dev) > show collection-history --id 448985828275718906
DBID: 1 Collection ID: 448985828275718906 Collection Name: zt_model_beta_1 Collection State: CollectionCreated Create Time: 2024-04-19 18:29:46 Fields:
Type Param max_length: 100 Enable Dynamic Schema: false Consistency Level: Strong Start position for channel by-dev-rootcoord-dml_13(by-dev-rootcoord-dml_13_448985828275718906v0): [203 99 168 6 0 0 0 0] Start position for channel by-dev-rootcoord-dml_12(by-dev-rootcoord-dml_12_448985828275718906v1): [103 14 190 6 0 0 0 0] Start position for channel by-dev-rootcoord-dml_8(by-dev-rootcoord-dml_8_448985828275718906v2): [151 184 151 6 0 0 0 0] Collection properties(0): Milvus(by-dev) > show collection-history --id 448801739682025168
DBID: 0 Collection ID: 448801739682025168 Collection Name: zt_model_beta_1 Collection State: CollectionCreated Create Time: 2024-04-08 11:03:55 Fields: Enable Dynamic Schema: false Consistency Level: Strong Start position for channel by-dev-rootcoord-dml_3(by-dev-rootcoord-dml_3_448801739682025168v0): [217 16 229 6 0 0 0 0] Start position for channel by-dev-rootcoord-dml_14(by-dev-rootcoord-dml_14_448801739682025168v1): [107 75 89 6 0 0 0 0] Start position for channel by-dev-rootcoord-dml_13(by-dev-rootcoord-dml_13_448801739682025168v2): [241 21 76 6 0 0 0 0] Collection properties(0):================================================================================ DBID: 0 Collection ID: 448801739682025168 Collection Name: zt_model_beta_1 Collection State: CollectionCreated Create Time: 2024-04-08 11:03:55 Fields: Enable Dynamic Schema: false Consistency Level: Strong Start position for channel by-dev-rootcoord-dml_3(by-dev-rootcoord-dml_3_448801739682025168v0): [217 16 229 6 0 0 0 0] Start position for channel by-dev-rootcoord-dml_14(by-dev-rootcoord-dml_14_448801739682025168v1): [107 75 89 6 0 0 0 0] Start position for channel by-dev-rootcoord-dml_13(by-dev-rootcoord-dml_13_448801739682025168v2): [241 21 76 6 0 0 0 0] Collection properties(0):
================================================================================ DBID: 0 Collection ID: 447397465447990697 Collection Name: zt_model_dev Collection State: CollectionCreated Create Time: 2024-01-31 15:29:26 Fields: Enable Dynamic Schema: false Consistency Level: Strong Start position for channel by-dev-rootcoord-dml_3(by-dev-rootcoord-dml_3_447397465447990697v0): [83 116 36 5 0 0 0 0] Start position for channel by-dev-rootcoord-dml_4(by-dev-rootcoord-dml_4_447397465447990697v1): [50 146 224 4 0 0 0 0] Start position for channel by-dev-rootcoord-dml_5(by-dev-rootcoord-dml_5_447397465447990697v2): [98 247 230 4 0 0 0 0] Collection properties(0):================================================================================ zt_model_dev The collection was found to be deleted after this problem occurred Operation to delete the collection zt_model_dev
Expected Behavior
Deleted collections should not appear in the cluster
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response