milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
29.28k stars 2.81k forks source link

[Feature]: Case insensitive varchar querying #34802

Open nahu02 opened 1 month ago

nahu02 commented 1 month ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

I want to enchance my vector searches with additional keyword filters. For example, the query "learning python" would be vectorised, then a search would include the expression content like '%programming%' || content like '%computer', to make sure I don't get documents about snakes (where content is a varchar field in the collection). The problem is, this discards documents that only have "Programming" or "Computer" in them.

Describe the solution you'd like.

In boolean expressions, case insensitive strings could have a special marking, e.g. `computer`. This would then be == to comPUter and Computer and COMPUTER and...

Describe an alternate solution.

Queries/searches/functions that use boolean expressions could have an extra bool parameter that dictates if case should be disregarded. This could have a default false value for backwards compatibility.

Anything else? (Additional Context)

No response

nahu02 commented 1 month ago

Thinking more about it, this could be generalized to have a new regex filtering rule. This is of course a whole new, bigger can of worms, but may be useful for more use cases

xiaofan-luan commented 1 month ago

Is there an existing issue for this?

  • [x] I have searched the existing issues

Is your feature request related to a problem? Please describe.

I want to enchance my vector searches with additional keyword filters. For example, the query "learning python" would be vectorised, then a search would include the expression content like '%programming%' || content like '%computer', to make sure I don't get documents about snakes (where content is a varchar field in the collection). The problem is, this discards documents that only have "Programming" or "Computer" in them.

Describe the solution you'd like.

In boolean expressions, case insensitive strings could have a special marking, e.g. computer. This would then be == to comPUter and Computer and COMPUTER and...

Describe an alternate solution.

Queries/searches/functions that use boolean expressions could have an extra bool parameter that dictates if case should be disregarded. This could have a default false value for backwards compatibility.

Anything else? (Additional Context)

No response

do you think the case sensitve is necessary in filter or search, or might both?

nahu02 commented 1 month ago

Is there an existing issue for this?

  • [x] I have searched the existing issues

Is your feature request related to a problem? Please describe.

I want to enchance my vector searches with additional keyword filters. For example, the query "learning python" would be vectorised, then a search would include the expression content like '%programming%' || content like '%computer', to make sure I don't get documents about snakes (where content is a varchar field in the collection). The problem is, this discards documents that only have "Programming" or "Computer" in them.

Describe the solution you'd like.

In boolean expressions, case insensitive strings could have a special marking, e.g. computer. This would then be == to comPUter and Computer and COMPUTER and...

Describe an alternate solution.

Queries/searches/functions that use boolean expressions could have an extra bool parameter that dictates if case should be disregarded. This could have a default false value for backwards compatibility.

Anything else? (Additional Context)

No response

do you think the case sensitve is necessary in filter or search, or might both?

I think it should be the same everywhere where the boolean filters can be used

yiwen92 commented 1 month ago

we are developing a new "match" function, and it can do tokenizer and optional choose whether case sensitive or not, seems this can meet your requirement, we target to release this in coming Milvus 2.5