milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.8k stars 2.86k forks source link

[Bug]: Disk space builds up even if entries and inserted as well as deleted #36654

Open ramyagirish opened 1 day ago

ramyagirish commented 1 day ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.16
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): ubuntu
- CPU/Memory: 32 GB
- GPU: Not applicable
- Others: Disk space used 300 GB as 2.7 million records take close to 100 GB. The instance is an autoscaling group on AWS.

Current Behavior

currently, the database has 2.7 million records, and everyday while close to 4000 entries get added there are some 100 records that get deleted as well. So the number records do not grow and this is an expected trend. Unfortunately the disk space keeps on growing if the database instance (which is an auto-scaling group) is not restarted. It again comes back to 30% of 300 GB. This shouldn't happen as it means that we have to manually restart the DB instance after some time, say a week. Is there anything that can be done, in terms of configuration, in the milvus.yaml?

Expected Behavior

The disk space usage should not grow exponentially from 100 GB to 300 GB. When the number of records are not growing or changing drastically.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

xiaofan-luan commented 1 day ago

Is there an existing issue for this?

  • [x] I have searched the existing issues

Environment

- Milvus version: 2.2.16
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): ubuntu
- CPU/Memory: 32 GB
- GPU: Not applicable
- Others: Disk space used 300 GB as 2.7 million records take close to 100 GB. The instance is an autoscaling group on AWS.

Current Behavior

currently, the database has 2.7 million records, and everyday while close to 4000 entries get added there are some 100 records that get deleted as well. So the number records do not grow and this is an expected trend. Unfortunately the disk space keeps on growing if the database instance (which is an auto-scaling group) is not restarted. It again comes back to 30% of 300 GB. This shouldn't happen as it means that we have to manually restart the DB instance after some time, say a week. Is there anything that can be done, in terms of configuration, in the milvus.yaml?

Expected Behavior

The disk space usage should not grow exponentially from 100 GB to 300 GB. When the number of records are not growing or changing drastically.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

So reboot can help to reduce the size of disk?

When you talking about disk is it refering to minio or rocksDB? Could you try to upgrade to 2.3.22 and see?

We've fixed many garbage collection issue so hopefully it could help

yanliang567 commented 15 hours ago

/assign @ramyagirish