krisk / Fuse

Lightweight fuzzy-search, in JavaScript
https://fusejs.io/
Apache License 2.0
18.05k stars 766 forks source link

Logical $and does not support multiple $or expressions #449

Closed belvederef closed 4 years ago

belvederef commented 4 years ago

Describe the bug

A clear and concise description of what the bug is. In an `$and` logical expression, according to the docs, multiple expressions can be contained and they must all evaluate to true to satisfy the condition. However, if more than one `$or` expression is contained in `$and`, all apart from the first will be ignored. ### Version The Fuse.js version where this bug is happening. 6.2.0 ### Is this a regression?

Don't know

🔬Minimal Reproduction

Data:

[
  {
    "title": "Old Man's War",
    "author": {
      "firstName": "John",
      "lastName": "Scalzi",
      "age": "61",
    }
  }
]

Main:

const options = {
  keys: [
    "title",
    "author.firstName",
    "author.lastName",
    "author.age",
  ]
};

const fuse = new Fuse(list, options);

return fuse.search({
  $and: [
    { title: 'old' }, // Fuzzy "old war",
    { 
      $or: [
        { 'author.firstName': "j" },
        { 'author.lastName': "Sa" },
      ] 
    },
    {
      // This should return no result, since nobody has age === 62
      $or: [
        { age: "'62" },
        // ...
      ]
    },
  ]
})

Result:

[
  {
    "item": {
      "title": "Old Man's War",
      "author": {
        "firstName": "John",
        "lastName": "Scalzi",
        "age": "61"
      }
    },
    "refIndex": 0
  }
]

Additional context

If you move the expression:

    {
      $or: [
        { age: "'62" },
        // ...
      ]
    }

before the first $or, you'll see it gets correctly picked up and no result is returned.

Related to #411.

danielfdickinson commented 4 years ago

TITLE == old AND (OR AGE == 62) -- could be understood as TITLE == old AND (true OR AGE == 62) which would evaluate to true. It looks to me like a single $or is understood as having an implicit true.

Can you replicate with a $or which two parts, one of which always evaluates to false?

belvederef commented 4 years ago

@cshoredaniel I do not think it works as you describe, since if you move the second $or expression up in the $and array it gets correctly picked up.

Also, mine was only a simple example but indeed using more statements in the $or condition holds the same results.

If you test the code provided on the project live demo page, you will get this behaviour.

danielfdickinson commented 4 years ago

Thank you for the info. That kind of eliminates my hypothesis.

krisk commented 4 years ago

Thanks for flagging. Will take a look.

krisk commented 4 years ago

Ok, there is a bug indeed, but also, your setup is incorrect. You need to enable extended search, and the age path should be author.age.

Thus it should be the following:

const options = {
  useExtendedSearch: true,
  keys: ['title', 'author.firstName', 'author.lastName', 'author.age']
}

const fuse = new Fuse(list, options)

const result = fuse.search({
  $and: [
    { title: 'old' },
    {
      $or: [{ 'author.firstName': 'j' }, { 'author.lastName': 'Sa' }]
    },
    {
      $or: [{ 'author.age': "'62" }]
    }
  ]
})

Regardless, even with the correct setup, the bug still occurs. Just about to publish a fix for this 😄

belvederef commented 4 years ago

@krisk thanks for looking into this. I am testing your new solution but it does work properly since the score is not calculated (it is null) and, therefore, there is no sorting. If you include the option flag includeScore: true in the example you posted above you will see this. I believe unit tests missed this particular case.

belvederef commented 4 years ago

I tested the build 6.3.1 and unfortunately it has other issues. Now in the search includes some results that have no overlap whatsoever with the searched word and have a score of e.g. 1.3376100250904978e-8. These appear at the top and even setting a threshold value of 0.5 does not get rid of them.

Moreover, when it finds a perfect match the result holds a very high value too (e.g. "score":1.4901161193847673e-8). I am happy to provide more information if needed. Thanks for your hard work.

krisk commented 4 years ago

Thanks @belvederef for looking into this!

I tested the build 6.3.1 and unfortunately it has other issues. Now in the search includes some results that have no overlap whatsoever with the searched word and have a score of e.g. 1.3376100250904978e-8. These appear at the top and even setting a threshold value of 0.5 does not get rid of them.

Could you send me an example list, fuse setup, and search query that demonstrate the issue? Note that the threshold is against the fuzziness score, and not the relevance score (which is used for overall sorting computation, and what is actually exposed).

Moreover, when it finds a perfect match the result holds a very high value too (e.g. "score":1.4901161193847673e-8). I am happy to provide more information if needed. Thanks for your hard work.

That evaluates to 0.0000000149, so, a pretty low value to me 😄 . Score won't always necessary be 0 (more info on scoring theory).

belvederef commented 4 years ago

You're right! I overlooked the e-8 and got confused because the searched keyword did not appear in that result at all. Here is a repo that reproduces the issue https://github.com/belvederef/fuse-449.

If you start the project and navigate to http://localhost:5050/api/predicate?search=child to search the word child, you will see the results are all messed up.

Preview of the results ``` child http://dbpedia.org/ontology/child displacement (μ³) http://dbpedia.org/ontology/displacement acceleration (s) http://dbpedia.org/ontology/acceleration top speed (kmh) http://dbpedia.org/ontology/topSpeed id http://www.w3.org/ns/activitystreams#id weight (g) http://dbpedia.org/ontology/weight width (μ) http://dbpedia.org/ontology/width organization http://www.w3.org/ns/org#organization Indica l'Organization in cui l'Agent è un membro. member http://www.w3.org/ns/org#member Indica la Person (o un altro Agent) coinvolto in una relazione di Membership. È l'inverso di `org:hasMembership`. transitive sub-organization http://www.w3.org/ns/org#transitiveSubOrganizationOf La version transitive de la propriété subOrganizationOf, renvoie une représentation de toutes les organisations qui contiennent celle-ci. Notez que ceci est une super-propriété de la relation transitive donc elle pourrait contenir des assertions additionnelles mais cet usage n'est pas recommandé. equal (0..*) http://purl.org/goodrelations/v1#equal This ordering relation for qualitative values indicates that the subject is equal to the object. architect http://dbpedia.org/ontology/architect ```

However, if you change the fuse.js version in the package.json to 6.2.0, the results are correct.

Preview of the results ``` child http://dbpedia.org/ontology/child childMaxAge http://schema.org/childMaxAge Maximal age of the child. childMinAge http://schema.org/childMinAge Minimal age of the child. children http://schema.org/children A child of the person. architect http://dbpedia.org/ontology/architect numChildren http://schema.org/numChildren The number of children staying in the unit. child organisation http://dbpedia.org/ontology/childOrganisation children features http://www.geonames.org/ontology#childrenFeatures Links to an RDF document containing the descriptions of children features left child http://dbpedia.org/ontology/leftChild right child http://dbpedia.org/ontology/rightChild parent-child property http://purl.org/linked-data/cube#parentChildProperty Specifies a property which relates a parent concept in the hierarchy to a child concept. ```

Apologies if this is not the simplest setup. let me know if you have any questions.

krisk commented 4 years ago

It seems to be working as expected. The search query is also looking for 'Property$', which is matching all those items.

The reason why it appeared as if it was working in the previous version is because of the original bug.

One way you could fine-tune the fuzzy searching is by increasing minMatchCharLength.

belvederef commented 4 years ago

@krisk thanks for looking into this! You are absolutely right, the seemingly messed up results were caused by my attempt at filtering the results. E.g. in my case I want to find matches but only among all the entries that have a property type ending with the string Property. One way I could make it work for this use case was using the extended search with a low value (0.00001) so that the logical operators would do their job at filtering the data but would not affect the score of the results.

I would see this as a pretty common use case and it ties in with the work described in #411. This could be specified in a new property in options that gets merged with the global fuse.js options, so that you could do at query time:

fuse.search({
  // logical operators for matching
},
{
  filters: {
    // logical operators for filtering not affecting the score
    $and: {
      { type: 'Property$' },
      { $or: [...] },
      // etc.
    }
  }
})