milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
31.02k stars 2.95k forks source link

[Bug]: Disk space builds up even if entries and inserted as well as deleted #36654

Open ramyagirish opened 1 month ago

ramyagirish commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version: 2.2.16
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): ubuntu
- CPU/Memory: 32 GB
- GPU: Not applicable
- Others: Disk space used 300 GB as 2.7 million records take close to 100 GB. The instance is an autoscaling group on AWS.

Current Behavior

currently, the database has 2.7 million records, and everyday while close to 4000 entries get added there are some 100 records that get deleted as well. So the number records do not grow and this is an expected trend. Unfortunately the disk space keeps on growing if the database instance (which is an auto-scaling group) is not restarted. It again comes back to 30% of 300 GB. This shouldn't happen as it means that we have to manually restart the DB instance after some time, say a week. Is there anything that can be done, in terms of configuration, in the milvus.yaml?

Expected Behavior

The disk space usage should not grow exponentially from 100 GB to 300 GB. When the number of records are not growing or changing drastically.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

xiaofan-luan commented 1 month ago

Is there an existing issue for this?

  • [x] I have searched the existing issues

Environment

- Milvus version: 2.2.16
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): ubuntu
- CPU/Memory: 32 GB
- GPU: Not applicable
- Others: Disk space used 300 GB as 2.7 million records take close to 100 GB. The instance is an autoscaling group on AWS.

Current Behavior

currently, the database has 2.7 million records, and everyday while close to 4000 entries get added there are some 100 records that get deleted as well. So the number records do not grow and this is an expected trend. Unfortunately the disk space keeps on growing if the database instance (which is an auto-scaling group) is not restarted. It again comes back to 30% of 300 GB. This shouldn't happen as it means that we have to manually restart the DB instance after some time, say a week. Is there anything that can be done, in terms of configuration, in the milvus.yaml?

Expected Behavior

The disk space usage should not grow exponentially from 100 GB to 300 GB. When the number of records are not growing or changing drastically.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

So reboot can help to reduce the size of disk?

When you talking about disk is it refering to minio or rocksDB? Could you try to upgrade to 2.3.22 and see?

We've fixed many garbage collection issue so hopefully it could help

yanliang567 commented 1 month ago

/assign @ramyagirish

ramyagirish commented 1 month ago

@xiaofan-luan So we are running milvus's standalone containerized version in an AWS autoscaling instance. We have also connected this DB instance to S3 bucket instead of Minio. The DB gets updated meaning new entries and added and some irrelevant and expired entries are deleted on a daily cadence. So the number of entries does not change much theoretically looking at the number of entries added and deleted, but the disk space associated with the AWS autoscaling instance steadily increases. When I reboot or restart the AWS instance, the disk space again comes back to same levels. Why is this happening? Why do we need to reboot the instance again and again?

Also I am going to upgrade to 2.3.22. Will monitor it for a week (the DB size) and let you know

ramyagirish commented 1 month ago

@xiaofan-luan I am trying to configure 2.3.22 version with S3 bucket and the standalone container is always crashing. I also tried with 2.5.12. I am modifying the following fields in milvus.yaml:

minio:
  address: "s3.us-east-1.amazonaws.com"
  port: 443
  accessKeyID: my_access_key
  secretAccessKey: my_access_secret
  useSSL: true
  bucketName: bucketname_in_s3
  rootPath: object_folder_name_s3

This works perfectly with 2.2.16. Please advise.

xiaofan-luan commented 1 month ago

2.3.22 should be fully compatible with 2.2.16 version. Could you offer logs of your cluster.

And also be aware of data has a gc period. It usually takes several days to make the gc happen. Check garbage_collector.go for detailed gc logic

stale[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.