milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.16k stars 2.8k forks source link

[Bug]: Scalar indexes cannot search out data #34548

Open syang1997 opened 1 month ago

syang1997 commented 1 month ago

Is there an existing issue for this?

Environment

- Milvus version: v2.3.15
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):   kafka 
- SDK version(e.g. pymilvus v2.0.0rc2): java sdk 2.3.4  , attu
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Hybrid search cannot find out data, but a separate query can find out data img_3 img_4 This scalar query has data img_5

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

milvus-log (1).tar.gz

Anything else?

No response

syang1997 commented 1 month ago

Just now I backed up this vector to another cluster, and the exception can still be reproduced. After deleting the collection from the backup cluster, I backed up and restored it again, and it cannot be reproduced.

yanliang567 commented 1 month ago

/assign @zhagnlu please help to take a look /unassign

syang1997 commented 1 month ago

This is not because scalar filtering causes hnsw to be unable to perform layer traversal, because there is another set of data that can also have data with scalars after multiple queries. And no data operation is performed during this period

syang1997 commented 1 month ago

企业微信截图_0d95f1b9-6c93-432b-a642-65ab25f5e055 企业微信截图_62c52532-acb0-4f40-8fca-acf446495788

syang1997 commented 1 month ago

This collection configures replicas. Is it caused by index differences between replicas?

zhagnlu commented 1 month ago

企业微信截图_0d95f1b9-6c93-432b-a642-65ab25f5e055 企业微信截图_62c52532-acb0-4f40-8fca-acf446495788

what difference between upper and bottom search ?

syang1997 commented 1 month ago

what difference between upper and bottom search ?

@zhagnlu There is no difference, multiple requests return completely different results

zhagnlu commented 1 month ago

what difference between upper and bottom search ?

@zhagnlu There is no difference, multiple requests return completely different results

if not hybrid search, just using query, will multiple requests return completely different results ?

syang1997 commented 1 month ago

what difference between upper and bottom search ?

@zhagnlu There is no difference, multiple requests return completely different results

if not hybrid search, just using query, will multiple requests return completely different results ?

Vector search and query returns normally

syang1997 commented 1 month ago

@zhagnlu Another phenomenon is that some search conditions cannot be returned at all if they have scalar filtering, but vector searches have returns. But this scalar filtering has data. hybrid search returns blank

yanliang567 commented 1 month ago

okay, so the issue here is that if using query with expr filter on scalar fields, the results are not correct or consistent(not expected). But if query without fitlering, the results are always consistent(expected). Am I right? /assign @cydrain @liliu-z could you please also help to take a look

syang1997 commented 1 month ago

@yanliang567 Yes, sometimes the returned results are inconsistent, and sometimes the returned results are incorrect.Appears only on the hnsw index plus scalar filtering

syang1997 commented 1 month ago

Regenerated debug log milvus-log (2).tar.gz

syang1997 commented 1 month ago

Regenerated debug log milvus-log (2).tar.gz

@yanliang567 @cydrain @liliu-z Can you help us check together?

alwayslove2013 commented 1 month ago

@syang Could you please tell us the filter_rate and index building parameters? The current open source Milvus may have “less than top-k search results” problems with high filter_rates (70-90%) and low M.

syang1997 commented 1 month ago

@syang Could you please tell us the filter_rate and index building parameters? The current open source Milvus may have “less than top-k search results” problems with high filter_rates (70-90%) and low M.

image

Most searches are normal, and now the M value is not small

syang1997 commented 1 month ago

@alwayslove2013 This collection has less than 20,000 data, but the M value and efConstruction are large enough (I think).I know about the data island problem that scalar filtering and hnsw work together, and I have previously investigated and adjusted the index construction parameters

cydrain commented 1 month ago

Hi @syang1997 ,

Can you share your script to reproduce this issue ?

syang1997 commented 1 month ago

Hi @syang1997 ,

Can you share your script to reproduce this issue ?

I'm coding a demo to replicate this issue

cydrain commented 1 month ago

Hi @syang1997 ,

One more question, I see you're using Milvus v2.3.15, have you tried Milvus v2.4.x ?

xiaofan-luan commented 1 month ago

@syang1997 I think we need data to reproduce this issue. @cydrain please setup a meeting with syang see if we can get some data to reproduce

syang1997 commented 1 month ago

@syang1997 I think we need data to reproduce this issue. @cydrain please setup a meeting with syang see if we can get some data to reproduce

We have already communicated with the community once, and the preliminary reason is still the previous data island problem

syang1997 commented 1 month ago

@syang1997 I think we need data to reproduce this issue. @cydrain please setup a meeting with syang see if we can get some data to reproduce

@xiaofan-luan The phenomenon is that there is no return instead of returning insufficient topK, so it is suspected that the first layer node of HNSW is filtered by all

xiaofan-luan commented 1 month ago

after discussion, it seems the reason might be hnsw filtered 70-80% data, cause graph connectivity brokes

liliu-z commented 1 month ago

This fix will be released with 2.4.7

liliu-z commented 1 month ago

/assign @yanliang567

syang1997 commented 1 month ago

This fix will be released with 2.4.7

Can it be merged to version 2.3.x? @yanliang567 @liliu-z

yanliang567 commented 3 weeks ago

This fix will be released with 2.4.7

Can it be merged to version 2.3.x? @yanliang567 @liliu-z

I don't think so, as 2.3.20 could be the last release of 2.3.x

syang1997 commented 3 weeks ago

This fix will be released with 2.4.7

Can it be merged to version 2.3.x? @yanliang567 @liliu-z

I don't think so, as 2.3.20 could be the last release of 2.3.x

Okay, we will choose to upgrade to 2.4.x later