manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.94k stars 497 forks source link

Word in proximity search gets highlighted twice, even when not matched #894

Open regstuff opened 2 years ago

regstuff commented 2 years ago

Describe the bug
Consider a sentence such as:

something word11 something something something word10 something something something something word10 something


Here, word10 appears twice, The two instances of word10 are essentially 5 words apart.
Only the first word10 fulfills a condition of word11 NEAR/5 word10. However, highlights are created for both instances of word10.

With a search such as word11 NEAR/4 word10 only the first word10 is highlighted.

[MRE]
First search will highlight correctly. Second one highlights both word10s.

row = {'title': '<p>something word11 something something something word10 something something something something word10 something</p>', 'contentid': '5'}  
resp = indexApi.insert({"index" : "products", "doc" : row})  

resp = utilsApi.sql("""SELECT *, HIGHLIGHT({before_match=\'<span class="match">\', after_match=\'</span>\', limit=0, html_strip_mode=\'retain\'}, \'\') FROM products WHERE MATCH('word11 NEAR/4 word10')""")  

resp = utilsApi.sql("""SELECT *, HIGHLIGHT({before_match=\'<span class="match">\', after_match=\'</span>\', limit=0, html_strip_mode=\'retain\'}, \'\') FROM products WHERE MATCH('word11 NEAR/5 word10')""")  
popalot2 commented 10 months ago

Seems to be a related bug, HIGHLIGHT with quoted expression and NEAR is highlighting wrong results and not highlighting the correct results. Seem example:

query:
SELECT HIGHLIGHT({},TO_STRING('red fox big fox big red bear red fox'), '"red fox"  NEAR/50 bear')
from idx_fake_oneline;

result:
<b>red fox</b> big <b>fox</b> big <b>red bear</b> red fox

expected result:
<b>red fox</b> big fox big red <b>bear</b> <b>red fox</b>
or 
<b>red fox</b> big fox big red <b>bear red fox</b>