manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.05k stars 509 forks source link

Highlighting inconsistent with more than 2 NEAR terms #888

Open regstuff opened 2 years ago

regstuff commented 2 years ago

Describe the bug When searching for 3 terms using the NEAR operator, word1 is not highlighted if there are two instances of word2 that satisfy the proximity rule, but only one instance of word3 For example: word1 randomword word2 randomword word3 randomword word2 Highlighting of word1 breaks when both instances of word2 satisfy the proximity condition.

[MRE] Below python code creates two docs. In the first doc, 'word' is present twice. In the second, the last instance of 'word' is replaced by something else. Two searches are conducted. The second search will include the second instance of 'word' in proximity in doc1, and causes highlighting to break for 'sentence'.

ixname = 'products'
row = {'title': '<p>sentence and word and letter and word</p>', 'contentid': '1'}
resp = indexApi.insert({"index" : ixname, "doc" : row})

row = {'title': '<p>sentence and word and letter and punctuation</p>', 'contentid': '2'}
resp = indexApi.insert({"index" : ixname, "doc" : row})

resp = utilsApi.sql('SELECT *, HIGHLIGHT({before_match=\'<span class="match">\', after_match=\'</span>\', limit=0, html_strip_mode=\'retain\'}, \'\') FROM products WHERE MATCH(\'sentence NEAR/2 word NEAR/2 letter\')')
print(resp[0]['data'])
print('+{10}')
resp = utilsApi.sql('SELECT *, HIGHLIGHT({before_match=\'<span class="match">\', after_match=\'</span>\', limit=0, html_strip_mode=\'retain\'}, \'\') FROM products WHERE MATCH(\'sentence NEAR/7 word NEAR/2 letter\')')
print(resp[0]['data'])
sanikolaev commented 2 years ago

MRE in SQL form

mysql> drop table if exists t; create table t (f text); insert into t(f) values('sentence and word and letter and word'); select highlight() from t where match('sentence NEAR/7 word NEAR/2 letter');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t (f text)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t(f) values('sentence and word and letter and word')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select highlight() from t where match('sentence NEAR/7 word NEAR/2 letter')
--------------

+-----------------------------------------------------+
| highlight()                                         |
+-----------------------------------------------------+
| sentence and <b>word</b> and <b>letter</b> and word |
+-----------------------------------------------------+
1 row in set (0.00 sec)

sentence is expected to be highlighted, but it's not. If NEAR/7 is replaced with NEAR/2 it does get highlighted:

select highlight() from t where match('sentence NEAR/2 word NEAR/2 letter')
--------------

+------------------------------------------------------------+
| highlight()                                                |
+------------------------------------------------------------+
| <b>sentence</b> and <b>word</b> and <b>letter</b> and word |
+------------------------------------------------------------+
1 row in set (0.00 sec)