milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.39k stars 2.82k forks source link

[Bug]: Milvus Standalone not robust at batch insert operation #19327

Closed Leslie-Wong-H closed 1 year ago

Leslie-Wong-H commented 1 year ago

Is there an existing issue for this?

Environment

- Milvus version: 2.1.2
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2): @zilliz/milvus2-sdk-node 2.1.2
- OS(Ubuntu or CentOS): Ubuntu 20.04
- CPU/Memory: DigitalOcean 2C2G
- GPU: 
- Others:

Current Behavior

Try to adopt milvus-standalone-docker-compose.yml to my current trace.moe compose file to test at a DigitalOcean 2c2g VM. This is the polished compose file, link.

The number of vector data is around 3 million. However, the insert process does not work performantly. The insert approach I take is to insert at a batch of 10,000 each time and pause for 500ms. Yet occasionally "Connection dropped" error happened and milvus-standalone restarted and might manage to continue the insert process. Sadly, the insert process would eventually fail, with the milvus-standalone error from "Connection dropped" to "proxy not healthy". The same scenario happened when changed from 10,000 to 1,000.

Here is the loader code, link

de54d0c0329fa12d03095f8c074a383

4c5bcbe4bc26af2a056218afdce29c6

6255a5c963b120e2a425e4d04f70fda

Expected Behavior

Milvus standalone could perform robustly and smoothly at batch insert operation.

Steps To Reproduce

DigitalOcean 2C2G Ubuntu 20.04

prepare liresolr hash files at /home/soruly/trace.moe-hash

Use this docker compose file [link](https://github.com/Leslie-Wong-H/trace.moe/blob/master/docker-compose.yml)

docker-compose up -d

docker logs -f -n 100 tracemoe_loader_1

Milvus Log

xshF040.tmp.txt

Anything else?

https://groups.google.com/g/grpc-io/c/xTJ8pUe9F_E

yanliang567 commented 1 year ago

@Leslie-Wong-H thank you for the issue. Looking into the log, I find the rootcoord lost connection to etcd, which causes milvus stop working. Could you please try

  1. expand the VM to 4C8G or even bigger
  2. make sure the etcd is using SSD volume Check more prerequisite for installation here: https://milvus.io/docs/v2.1.x/prerequisite-docker.md

/assign @Leslie-Wong-H /unassign

Leslie-Wong-H commented 1 year ago

It works at a 4C16G VM.