manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.05k stars 509 forks source link

Search fails when two NEAR operators are used, and second proximity is less than the first #892

Open regstuff opened 2 years ago

regstuff commented 2 years ago

Describe the bug
Consider the sentence: word2 something something word1 something word2 something word3
A search such as word1 NEAR/2 word2 NEAR/2 word3 will result in a match with the sentence
Similarly, word1 NEAR/3 word2 NEAR/3 word3 will also result in a match
word1 NEAR/3 word2 NEAR/2 word3 should also match this sentence but it does not.

A sentence without the first word2, such as: something something word1 something word2 something word3
will work for all the above proximity searches.

Search fails when two conditions are met:

  1. Wherever there are two (and possibly more) NEAR operators, if the proximity distance of the second operator is less than the distance of the first
  2. There are two instances of one of the words (word2) that satisfies the first proximity rule, but only one instance of the other word (word3).

MRE
Below python code creates the sentence and runs the 3 searches mentioned above. First search fails where as the rest work.

row = {'title': '<p>word2 something something word1 something word2 something word3</p>', 'contentid': '1'}  
resp = indexApi.insert({"index" : "products", "doc" : row})  
print(resp)  

resp = utilsApi.sql(f"""SELECT * FROM products WHERE MATCH('word1 NEAR/3 word2 NEAR/2 word3')""")  
print(resp)  

resp = utilsApi.sql(f"""SELECT * FROM products WHERE MATCH('word1 NEAR/2 word2 NEAR/2 word3')""")  
print(resp)  

resp = utilsApi.sql(f"""SELECT * FROM products WHERE MATCH('word1 NEAR/2 word2 NEAR/3 word3')""")  
print(resp)  
sanikolaev commented 2 years ago

MRE in SQL form

mysql> drop table if exists t; create table t (f text); insert into t(f) values('word2 smth smth word1 smth word2 smth word3'); select highlight() from t where match('word1 NEAR/2 word2 NEAR/2 word3'); select highlight() from t where match('word1 NEAR/3 word2 NEAR/2 word3');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.02 sec)

--------------
create table t (f text)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t(f) values('word2 smth smth word1 smth word2 smth word3')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select highlight() from t where match('word1 NEAR/2 word2 NEAR/2 word3')
--------------

+------------------------------------------------------------------+
| highlight()                                                      |
+------------------------------------------------------------------+
| word2 smth smth <b>word1</b> smth <b>word2</b> smth <b>word3</b> |
+------------------------------------------------------------------+
1 row in set (0.00 sec)

--------------
select highlight() from t where match('word1 NEAR/3 word2 NEAR/2 word3')
--------------

Empty set (0.00 sec)