milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.87k stars 2.94k forks source link

[Bug]: Milvus does not support reading scalar indexes created for special characters like % or other charactor, which causes the querynode to crash. #37912

Open xiaojunxiang2023 opened 1 day ago

xiaojunxiang2023 commented 1 day ago

Is there an existing issue for this?

Environment

Milvus Server: 2.4.11
java-sdk: 2.4.3

Current Behavior

When my client uses a scalar field as the query condition, it causes the querynode to crash. The querynode logs show an error indicating the presence of illegal characters. image

The content of my scalar field is: ABC!DEFGHI-JK_@#$%^&*LMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 image

I think Milvus:

  1. Supports writing special characters.
  2. Also supports building indexes for special characters.
  3. However, it does not support reading such indexes, which causes the querynode to crash.

Expected Behavior

I think this logic is very unreasonable. Validation should be done during data insertion or at the time of building the index, and any issues should be raised then, rather than waiting until the data is actually used to throw an error.

Steps To Reproduce

The reproduction steps are as follows:

Create a scalar field and build an index for it.
Insert the content ABC!DEFGHI-JK_@#$%^&*LMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 into the scalar field.
Use a query condition with the scalar field like "A%".
Specify a vector field to execute the search.
This will reproduce the issue, causing the querynode to crash.

Notes:
You can implement this with multithreading and loops for stress testing.
When using like, make sure % is not at the beginning of the query string. Starting with % disables prefix matching, which prevents the index from being used and avoids triggering the issue.

Milvus Log

No response

Anything else?

No response

xiaofan-luan commented 15 hours ago

@xiaojunxiang2023 you are using 2.4.x but from my knowlege milvus 2.4 don't have rust related logic. Only 2.5 when we introduce tantivy. can you share your test code if possible? Are you using prefix filtering, A like "AA%" or random like "%AA%"

xiaofan-luan commented 15 hours ago

/assign @sunby please help on it as well

sunby commented 12 hours ago

It looks like the query expression is "^%" and it is parsed to "^(.|\n)*" which is invalid for regex parser. We should escape the origin pattern and then pass it to tantivy.