weaviate / weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.
https://weaviate.io/developers/weaviate/
BSD 3-Clause "New" or "Revised" License
11k stars 759 forks source link

(feature request) Multiple filters on cross references should match on a single object #2477

Open hsm207 opened 1 year ago

hsm207 commented 1 year ago

Given this dataset:

{
  "name" : "Zach",
  "car" : [
    {
      "make" : "Saturn",
      "model" : "SL"
    },
    {
      "make" : "Suba",
      "model" : "Imprezza"
    }
  ]
}
{
  "name" : "Bob",
  "car" : [
    {
      "make" : "Saturn",
      "model" : "Imprezza"
    }
  ]
}

and the following query:

{
  Get {
    CarsNameMapping(
      where: {
      operator:And
      operands: [
        {path: ["hasCar", "Car", "make"], operator: Equal, valueString: "Saturn"}
        {path: ["hasCar", "Car", "model"], operator: Equal, valueString: "Imprezza"}
      ]}
    ) {
      name
    }
  }
}

the expected result should be:

{
  "data": {
    "Get": {
      "CarsNameMapping": [
        {
          "name": "Bob"
        }
      ]
    }
  }
}

but the current result (as at version 1.17.1) is:

{
  "data": {
    "Get": {
      "CarsNameMapping": [
        {
          "name": "Zach"
        },
        {
          "name": "Bob"
        }
      ]
    }
  }
}

which is incorrect in my opinion because my expectation is only for one object to be returned because I interpret the above the query as:

Give me the name of the CarNameMapping object that has an instance of a car whose make is Saturn and model is Imprezza.

byronvoorbach commented 1 year ago

I've seen this issue mentioned before, and I agree that the expectation would be only to have 1 document matching in this case. +1

etiennedi commented 1 year ago

I understand there is a use case to get the desired behavior, but it shouldn't be reflected with the given syntax. Since we use AND and OR operators, they should behave like they would in a programming language. Look at the way the and is placed in that query:

      operands: [
        {path: ["hasCar", "Car", "make"], operator: Equal, valueString: "Saturn"}
        {path: ["hasCar", "Car", "model"], operator: Equal, valueString: "Imprezza"}
      ]}

The AND is clearly on the outside, so if both operand1 and operand2 are true for a CarsNameMapping, it should be included.

Going through this one by one:

  1. First, we need all the CarNameMappings that have a ref to a car with make saturn. Both Zach and Bob match this criteria, so the result of operand1 is Zach, Bob.
  2. Then, we need to evaluate the other operand. For this, we need to find all the CarNameMappings that have a ref to a car with model=Imprezza. Both Zach and Bob fulfill these criteria, so the result of operand2 is Zach,Bob
  3. Finally, we need an AND intersection of both lists. The intersection of Bob,Zach and Bob,Zach is Bob,Zach.

Therefore the result is correct given the syntax.

Now what you're saying is that you expect:

Give me the name of the CarNameMapping object that has an instance of a car whose make is Saturn and model is Imprezza.

This is a reasonable interpretation, but it it is not what the query above represents. Look at where you placed the AND, you placed the AND inside the car. But the above query has no AND at the car level. It only has an AND at the CarsNameMapping level.

So to represent your query we need to introduce a new syntax. For example, something like the following:

{
  Get {
    CarsNameMapping(
      where: {
      operator:And
      path: ["hasCar", "Car"],
      operands: [
        { path:[ "make"], operator: Equal, valueString: "Saturn"}
        { path: ["model"], operator: Equal, valueString: "Imprezza"}
      ]}
    ) {
      name
    }
  }
}

I don't know if that is feasible, but this query would represent what you outlined in human language: The AND should be applied to a Car, not to a CarNameMapping.

rcbevans commented 1 year ago

Is there any update on this? I'm just starting to investigate Weaviate and immediately ran into this. Not necessarily a deal breaker as I can probably post-filter the results but it would definitely be nice to be able to express this natively.

77akella commented 1 year ago

would also be happy to have this feature