milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.29k stars 2.81k forks source link

[Bug]: bulksearch with (nq=500,nlist=1024,nprobe=32) is very slow,feature is orb feature with 256 dim and indexed by BIN_IVF_FLAT #34998

Open kkpssr opened 1 month ago

kkpssr commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version:2.4.2
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: amd 5950 128G
- GPU: 
- Others:

Current Behavior

the performance is very instability, sometime cost 800ms or more, sometimes cost 300ms in 2m data ,but is far away from example in FAQ shows only cost 200ms when nq=1000 in 50m data.my search pipeline is search 400-500 features first and then bulkinsert into collections.

Expected Behavior

reached performance showed in FAQ(200ms with nq=1000 in 50m data)

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

yanliang567 commented 1 month ago
  1. Search performance could be a)impacted by resource competition underlying; b)the growing data amount by insert/bulkinsert; c)the new segments compacted or not to a larger one; etc.
  2. As to the performance FAQ, it is about the sift dataset, which you are running with binary vectors, so they are different.
  3. to address your performance issue, could please offer
    1. the full milvus logs and metric screenshots
    2. how many cpu are you using for query nodes? are they running exclusively?
    3. what sdk are you using, and what is the version

/assign @kkpssr /unassign

kkpssr commented 1 month ago
  1. Search performance could be a)impacted by resource competition underlying; b)the growing data amount by insert/bulkinsert; c)the new segments compacted or not to a larger one; etc.
  2. As to the performance FAQ, it is about the sift dataset, which you are running with binary vectors, so they are different.
  3. to address your performance issue, could please offer

    1. the full milvus logs and metric screenshots
    2. how many cpu are you using for query nodes? are they running exclusively?
    3. what sdk are you using, and what is the version

/assign @kkpssr /unassign 屏幕截图 2024-07-25 172623

yanliang567 commented 1 month ago

@kkpssr the log you attached indicates that there index tasks pending for building index, which means that there are segments has not been indexed, Milvus resorts to brute-force search on the raw data—drastically increasing query time.

kkpssr commented 1 month ago

@kkpssr the log you attached indicates that there index tasks pending for building index, which means that there are segments has not been indexed, Milvus resorts to brute-force search on the raw data—drastically increasing query time.

so bulk-insert operation always return insert success immediately but index-built operation is not finished?

yanliang567 commented 1 month ago

@kkpssr you could get the bulk insert task state. In milvus 2.4, if the task state is completed, it means the data of this task was indexed. please check https://milvus.io/api-reference/pymilvus/v2.4.x/ORM/utility/get_bulk_insert_state.md

kkpssr commented 1 month ago

@kkpssr you could get the bulk insert task state. In milvus 2.4, if the task state is completed, it means the data of this task was indexed. please check https://milvus.io/api-reference/pymilvus/v2.4.x/ORM/utility/get_bulk_insert_state.md

and how long it will take to build BIN_IVF index for 256 dim 32 bytes 50m data in normal?

xiaofan-luan commented 1 month ago

it based on how many index resources you have.

each index build should take no more than 10 minutes.

you need to check

  1. how many segments do you have
  2. does all segments already has index on it?

Bird watcher can help you to get those information

alexanderguzhva commented 1 month ago

todo: add SIMD speedup for binary indices

xiaofan-luan commented 1 month ago

todo: add SIMD speedup for binary indices

I though faiss already supported simd for binary?

stale[bot] commented 19 hours ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.