milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.43k stars 2.82k forks source link

[Bug]: 2.0.0 rc8 hang at creating index after idle for a while #12497

Closed rchangj closed 2 years ago

rchangj commented 2 years ago

Is there an existing issue for this?

Environment

- Milvus version: 2.0.0-rc8 
- Deployment mode(standalone or cluster):standalone
- SDK version(e.g. pymilvus v2.0.0rc2):2.0.0rc8
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 128G
- GPU: N/A
- Others:

Current Behavior

I ran Milvus and let it be idle for about one week. After that Milvus can't create index or load collection any more. I tried followings:

  1. Using python API and cli both
  2. reboot milvus

I didn't see any error coming out of LOG. Nothing has changed. I eventually have to remove entire /volumes folder. Can someone help take a look at this issue? Thanks.

Expected Behavior

No response

Steps To Reproduce

No response

Anything else?

No response

xiaofan-luan commented 2 years ago

@yanliang567 we've seen many scenarios where an idle cluster hang after several days. probably it's exactly the same issue as #12423, do we have any resources to create a standalone cluster, create a colelction and write a few data, leave it for couple of days see what's happening?

yanliang567 commented 2 years ago

working on it...

yanliang567 commented 2 years ago

I have deployed a milvus instance and inserted 4 million entities 2 days ago, now i do nothing just wait...I will try to build index next Monday.

rchangj commented 2 years ago

@yanliang567 Last time I saw this issue, is after about 7 days of idle.

yanliang567 commented 2 years ago

it does not repro to me on master-20211201-62e3f68

12/09/2021 02:43:21 AM - INFO - index param: {'index_type': 'HNSW', 'params': {'M': 8, 'efConstruction': 200}, 'metric_type': 'L2'}
12/09/2021 02:43:21 AM - INFO - search_param: {'metric_type': 'L2', 'params': {'ef': 64}}
12/09/2021 02:48:18 AM - INFO - assert build index insert_nb40000_shards2_threads2_per50_f: 296.989
12/09/2021 02:52:02 AM - INFO - assert load insert_nb40000_shards2_threads2_per50_f: 223.977
12/09/2021 02:52:02 AM - INFO - Start search nq1_top10_1threads_per10
12/09/2021 02:52:02 AM - INFO - assert search thread0 round0: 0.225
12/09/2021 02:52:02 AM - INFO - assert search thread0 round1: 0.186

miluvs-he-62e3f68-etcd-0                                      1/1     Running     0          7d15h
miluvs-he-62e3f68-milvus-standalone-5c9dbf8577-lm789          1/1     Running     0          7d15h
miluvs-he-62e3f68-minio-66766765df-2v6c9                      1/1     Running     0          7d15h
yanliang567 commented 2 years ago

@rchangj do you have any chance to retry against the lastest master build? Please help to collect the milvus logs if it still repros to you, thanks.

/assign @rchangj /unassign

rchangj commented 2 years ago

@yanliang567 Can you give me your email address, I will send followings to you: rdb_data_kv_log rdb_data_log rdb_data_meta_kv_log

yanliang567 commented 2 years ago

@rchangj yanliang.qiao@zilliz.com

JackLCL commented 2 years ago

@rchangj Got it, we are working on it.

yanliang567 commented 2 years ago

/assign @czs007 @czs007 could you please help to take look at this issue? if you need more info, please @rchangj for help.

JackLCL commented 2 years ago

@rchangj We can't analyze the reason from RocksDB's log. Can you provide the log of Milvus?

JackLCL commented 2 years ago

And there is another solution. You can try the latest image of Milvus here and see if the bug still occurs: https://hub.docker.com/r/milvusdb/milvus-dev/tags

rchangj commented 2 years ago

@JackLCL The log issue is another issue I raised. I didn't see log coming out. Here is my log configuration:

log: level: debug # info, warn, error, panic, fatal file: rootPath: "/var/log/milvus" # default to stdout, stderr maxSize: 300 # MB maxAge: 10 # day maxBackups: 20 format: text # text/json

I can try recent image, which tag you would recommend? Do I just need to make change in docker-compose.yaml to point to the tag or any other change is needed.

JackLCL commented 2 years ago

@JackLCL The log issue is another issue I raised. I didn't see log coming out. Here is my log configuration:

log: level: debug # info, warn, error, panic, fatal file: rootPath: "/var/log/milvus" # default to stdout, stderr maxSize: 300 # MB maxAge: 10 # day maxBackups: 20 format: text # text/json

I can try recent image, which tag you would recommend? Do I just need to make change in docker-compose.yaml to point to the tag or any other change is needed.

@rchangj You can use "docker logs" command to get the log of Milvus. I suggest you use the latest tag.

---"Do I just need to make a change in docker-compose.yaml to point to the tag or any other change is needed?" ---Yes.

rchangj commented 2 years ago

@JackLCL I haven't seen the hang issue in the image : master-20211214-dbaca0a. The image has run close to 3 weeks.

Meantime, [Memory] memory leak on 2.0.0-rc8 #12423 seems stay the same.

xiaofan-luan commented 2 years ago

close because of fix