milvus-io / milvus

A cloud-native vector database, storage for next generation AI applications
https://milvus.io
Apache License 2.0
30.36k stars 2.91k forks source link

[Feature]: Support "&" and "|" operator in boolean expression #18805

Open yhmo opened 2 years ago

yhmo commented 2 years ago

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Users may have a lot of bool attributes to do filtering. He can create a bool field for each attribute: bool_1, bool_2, bool_3, .... He want to filter out the entities which bool_1=true, bool_2=ture, bool_3=true. He can do search filtering by this expression: search(expr="bool_1 && bool_2 && bool_3")

This approach is not efficient. It creates too many fields in a collection. And not easy to expand, when there is a new bool attribute, he need to create a new collection with the new field.

We can combine the bool attributes into an int64 value, with only one int64 field for the collection. Convert bool_1 as the first bit of the int64 value, bool_2 as the second bit, bool_3 as the third bit, etc. Then each int64 field can support up to 64 bool attributes.

And do search filtering in this way(assume the int64 field name is "flag"): search(expr="flag & 8 == 0")

Describe the solution you'd like.

No response

Describe an alternate solution.

No response

Anything else? (Additional Context)

No response

yhmo commented 2 years ago

One point to be concerned: If the two values are not the same length, for example, one is int64, the other is int16, can use this operator?

Hirdey-1999 commented 2 years ago

I can work on this but i need to where i have to make changes

brandonidas commented 1 year ago

Hi, is this available to be assigned? I am a 4th year CS student at UBC in Vancouver

xiaofan-luan commented 1 year ago

/assign @Hirdey-1999 /assign @brandonidas

xiaofan-luan commented 1 year ago

Hi All, I've assigned to you 2. Codes to learn before commit:

  1. learn Milvus G4 file and understand how we convert expression to plans
  2. add new plans bitand and bit or
  3. check cpp code in segcore and how +/- are done. this should similar to how & and | can work
Hirdey-1999 commented 1 year ago

I am not able to find the milvus g4 file can you help me to navigate to the file for learning about milvus g4

xiaofan-luan commented 1 year ago

I am not able to find the milvus g4 file can you help me to navigate to the file for learning about milvus g4

Unser internal/parser/planparserv2

xiaofan-luan commented 1 year ago

I am not able to find the milvus g4 file can you help me to navigate to the file for learning about milvus g4

Under internal/parser/planparserv2

charleskakumanu commented 4 months ago

Hi, is this issue still open? what files do we need to make changes for this issue? Do we need to make changes in schema? Could you help elaborating the issue @xiaofan-luan ? Thank you!

xiaofan-luan commented 4 months ago

this seems to be a simple issue, but there seems to be no volunteers on this yet. put all tags in an array and use bitset index might also be a very efficient way

charleskakumanu commented 4 months ago

In which file do I need to make the change? I would like to work on this please. @xiaofan-luan

xiaofan-luan commented 4 months ago

you need to start with the plan.g4 file, understand the grammer of milvus filtering expression.

after that. you need to implement a & and | operator in milvus core.

Are you by chance faminiliar with cpp and parser?

charleskakumanu commented 4 months ago

No I work on Java. But I am currently learning Go language to contribute to open source. I just know that parser uses lexer to create trees for executing an expression.

xiaofan-luan commented 4 months ago

this might be related to some execution logic. The parser is in golang so you can start from there