[Bug]: docker crash when inserts more data

milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications

https://milvus.io

Apache License 2.0

27.28k stars 2.63k forks source link

[Bug]: docker crash when inserts more data #32716

Open tadinhkien99 opened 3 weeks ago

tadinhkien99 commented 3 weeks ago

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version: 2.4.0
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): docker   
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0
- OS(Ubuntu or CentOS): window
- CPU/Memory: 64g ram
- GPU: 24g ram
- Others: docker 50gb/60gb ram

Current Behavior

When I insert upto 10M entities, docker crash then milvus disconnect. As I check because of cpu usage 100% and there no available RAM.

I use IVF_SQ8 index, each vectors 768 dimension. I install milvus docker gpu version. I use batchsize insert 10000 entities one time.

I think cpu and ram won't increase when we insert data?

Expected Behavior

Cpu and ram shouldn't OOM because only 10M entities

Steps To Reproduce

...

Milvus Log

...

Anything else?

...

yanliang567 commented 3 weeks ago

@tadinhkien99

if you are running IVF_SQ8, you don't need GPU image, try Milvus CPU image
according to my experience, for 768d vectors, please try to insert 1000 entities at a time
if it still reproduces to you, please off milvus logs for investigation. For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

/assign @tadinhkien99 /unassign

tadinhkien99 commented 3 weeks ago

@tadinhkien99

if you are running IVF_SQ8, you don't need GPU image, try Milvus CPU image

according to my experience, for 768d vectors, please try to insert 1000 entities at a time

if it still reproduces to you, please off milvus logs for investigation. For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

/assign @tadinhkien99 /unassign

@yanliang567

Since I have a GPU, I prefer using it to enhance search performance. I chose IVF_SQ8 as it's the best quantization type for limited resources.
I tried again with a batch size of 1000 entities. With IVF_SQ8, I managed to insert a total of 4.3M entities, while IVF_PQ with m=8 allowed me to insert around 10M entities before Docker crashed.
I've attached the log file below for your review. milvus.log

yanliang567 commented 3 weeks ago

I did not see any critical errors when the milvus crash, I guess there is a OOM with the container. could you please double check that? @tadinhkien99

/assign @congqixia could you please also take a look

tadinhkien99 commented 3 weeks ago

I did not see any critical errors when the milvus crash, I guess there is a OOM with the container. could you please double check that? @tadinhkien99

/assign @congqixia could you please also take a look @yanliang567

I'm aware that Docker can encounter Out of Memory errors, but in this instance, I was merely adding entities into the system without conducting any searches. What could be causing the OOM issue under these circumstances?

Additionally, I need advice on the most effective index type for handling larger datasets. I have approximately 50 million entities to insert. I initially used IVF_PQ, but encountered an OOM error after inserting only 10 million entities. What would you recommend?

xiaofan-luan commented 2 weeks ago

are u using GPU index or cpu index?
if it's cpu index,I believe 50GB memory is far more enough than 10m data.
when you using IVFSQ8, why specify m? m is only for HNSW index.

tadinhkien99 commented 2 weeks ago

are u using GPU index or cpu index?

if it's cpu index,I believe 50GB memory is far more enough than 10m data.

when you using IVFSQ8, why specify m? m is only for HNSW index.

I use cpu index type. But OOM on ram memory.
I deleted m param.

Now I use gpu cagra index rtx 4090 24gb ram. And it's fine for 4M entities. Do you have any ideas to optimize milvus.yaml (2.4)?

xiaofan-luan commented 2 weeks ago

what part you need top optimize with milvus.yaml?

tadinhkien99 commented 2 weeks ago

I want to use gpu index to save more and more entities. Around > 10M entities. Also where I can setup run multiple gpus? Thanks.

xiaofan-luan commented 2 weeks ago

you can use more GPU devices on single machine.

I think the document already cover multi device use case https://milvus.io/docs/install_standalone-helm-gpu.md

@Presburger can help if you hit any issue