milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.91k stars 2.95k forks source link

[Bug]: Milvus2.3.1-standalone raise error: [2023/12/27 10:49:07.304 +00:00] [WARN] [observers/leader_observer.go:256] ["sync distribution failed, cannot get schema of collection"] [leaderID=1070] #29543

Closed lxl0928 closed 9 months ago

lxl0928 commented 11 months ago

Is there an existing issue for this?

Environment

- Milvus version: V2.3.1
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2): restful api 2.2.x
- OS(Ubuntu or CentOS): Ubuntu VM
- CPU/Memory: core-milvus-milvus-standalone-58d6cbd646-zmswd      515m         1579Mi
- GPU: None
- Others:

Current Behavior

image

当特征实时写入Milvus时,发现Milvus每隔一段时间,会清理写入的特征,且相关Colleciton未配置ttl

查看collection配置如下:

+ 2023-12-27T18:59:23.419Z INFO <common> {-} | 157 Milvus 执行: func=describe_collection, uri=core-milvus-milvus.milvus-operator:19530, role_type=full_milvus, result={'status': {}, 'schema': {'name': 'prodFaceV202307', 'fields': [{'fieldID': 100, 'name': 'id', 'is_primary_key': True, 'data_type': 5}, {'fieldID': 101, 'name': 'vector', 'data_type': 101, 'type_params': [{'key': 'dim', 'value': '256'}]}]}, 'collectionID': 445671755849653025, 'virtual_channel_names': ['core-milvus-rootcoord-dml_1_445671755849653025v0'], 'physical_channel_names': ['core-milvus-rootcoord-dml_1'], 'created_timestamp': 446601214651269124, 'created_utc_timestamp': 1703648432355, 'shards_num': 1, 'collection_name': 'prodFaceV202307', 'num_partitions': 1180}, kwargs={'collection_name': 'prodFaceV202307'}

+ 2023-12-27T19:07:23.157Z INFO <common> {-} | 157 Milvus 执行: func=describe_collection, uri=core-milvus-milvus.milvus-operator:19530, role_type=full_milvus, result={'status': {}, 'schema': {'name': 'prodFaceV202307_20231227', 'fields': [{'fieldID': 100, 'name': 'id', 'is_primary_key': True, 'data_type': 5}, {'fieldID': 101, 'name': 'vector', 'data_type': 101, 'type_params': [{'key': 'dim', 'value': '256'}]}]}, 'collectionID': 445671755820898769, 'virtual_channel_names': ['core-milvus-rootcoord-dml_11_445671755820898769v0'], 'physical_channel_names': ['core-milvus-rootcoord-dml_11'], 'created_timestamp': 446261795801006103, 'created_utc_timestamp': 1702353652195, 'shards_num': 1, 'collection_name': 'prodFaceV202307_20231227', 'num_partitions': 1180}, kwargs={'collection_name': 'prodFaceV202307_20231227'}

+ 2023-12-27T19:07:56.780Z INFO <common> {-} | 157 Milvus 执行: func=describe_collection, uri=core-milvus-milvus.milvus-operator:19530, role_type=full_milvus, result={'status': {}, 'schema': {'name': 'prodFaceV202307_20231226', 'fields': [{'fieldID': 100, 'name': 'id', 'is_primary_key': True, 'data_type': 5}, {'fieldID': 101, 'name': 'vector', 'data_type': 101, 'type_params': [{'key': 'dim', 'value': '256'}]}]}, 'collectionID': 445671755820880703, 'virtual_channel_names': ['core-milvus-rootcoord-dml_8_445671755820880703v0'], 'physical_channel_names': ['core-milvus-rootcoord-dml_8'], 'created_timestamp': 446261787608743943, 'created_utc_timestamp': 1702353620944, 'shards_num': 1, 'collection_name': 'prodFaceV202307_20231226', 'num_partitions': 1180}, kwargs={'collection_name': 'prodFaceV202307_20231226'}

Expected Behavior

特征写入成功后,不被Milvus自身机制清理。

Steps To Reproduce

No response

Milvus Log

11月16日 milvus启动时日志.log

12月27日17:21 ~ 12月27日 18:40 发现vector被Milvus未知机制删除15w+特征的全量日志 <待补充>

12月27日 20:01时日志

logs-from-standalone-in-core-milvus-milvus-standalone-58d6cbd646-zmswd.log

Anything else?

No response

yanliang567 commented 11 months ago

@lxl0928 there is no info around 17:21 to 18:40 in the attached logs, and there is no logs about the collection "prodFaceV202307", so we cannot figure out what happened? could you please reproduce the issue and collect the logs again? also please ensure how many vectors did you insert, and do you have any delete operations?

/assign @lxl0928 /unassign

lxl0928 commented 11 months ago

因为有上述的sync distribution failed, cannot get schema of collectionlogs-from-standalone-in-core-milvus-milvus-standalone-58d6cbd646-zmswd.log 报错,日志增长很快,相关日志未及时导出,正在复现中。

新发现问题,在vector insert 过程中,insert了9.1w+ vectors,但是仍未生成相关segments,如果segments未及时落盘持久化,是不是存在丢数据的可能。

image image image
lxl0928 commented 11 months ago

@lxl0928 there is no info around 17:21 to 18:40 in the attached logs, and there is no logs about the collection "prodFaceV202307", so we cannot figure out what happened? could you please reproduce the issue and collect the logs again? also please ensure how many vectors did you insert, and do you have any delete operations?

/assign @lxl0928 /unassign

在11月27日11:00 ~ 12:00 期间,删除了120+ Collections(不包含"prodFaceV202307","prodFaceV202307"为新建Collection),每个Collections包含1180个partitions。

在做120+Collections删除前业务侧(11月26日 16:00 ~ 17:00期间)已发现相关特征检索不到。

yanliang567 commented 11 months ago

@lxl0928 do you have multiple milvuses that running against the same minio/s3 bucket?

lxl0928 commented 11 months ago

@lxl0928 do you have multiple milvuses that running against the same minio/s3 bucket?

有5套 milvus,分别是:cluster-milvus、core-milvus、qa-millvus、unqa-milvus、backup-milvus,分别对应的5个 bucket,确认配置,未出现多个milvus使用同一个bucket的情况。

image
lxl0928 commented 11 months ago

@yanliang567 今天成功复现了insert vectors过程中,出现vectors被milvus未知机制清理的情况。

但因为包含sync distribution failed, cannot get schema of collection、logs-from-standalone-in-core-milvus-milvus-standalone-58d6cbd646-zmswd.log 中所描述的报错

所以在复现期间日志量较大,10.2GB,日志下载链接:https://sensoro-backup-xining.oss-cn-beijing.aliyuncs.com/tmp/cluster-milvus-milvus-standalone-64bcc6f577-968s2.log.gz/cluster-milvus-milvus-standalone-64bcc6f577-968s2.log.gz

辛苦帮忙看下。

yanliang567 commented 11 months ago

/assign @congqixia please help to take a look

/unassign

congqixia commented 10 months ago

@lxl0928

@yanliang567 今天成功复现了insert vectors过程中,出现vectors被milvus未知机制清理的情况。

但因为包含sync distribution failed, cannot get schema of collection、logs-from-standalone-in-core-milvus-milvus-standalone-58d6cbd646-zmswd.log 中所描述的报错

所以在复现期间日志量较大,10.2GB,日志下载链接:https://sensoro-backup-xining.oss-cn-beijing.aliyuncs.com/tmp/cluster-milvus-milvus-standalone-64bcc6f577-968s2.log.gz/cluster-milvus-milvus-standalone-64bcc6f577-968s2.log.gz

辛苦帮忙看下。

the 10.2GB link file has only 600MB size and cannot be opened

当特征实时写入Milvus时,发现Milvus每隔一段时间,会清理写入的特征,且相关Colleciton未配置ttl

from the sceenshot, the approximated row count does not remove deleted rows. Just to make sure, do you have any deletion in your system. Since deleted rows will be remove only after compaction.

stale[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.