manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.94k stars 496 forks source link

Highlighting expanded words #1121

Open ogolosovskiy opened 1 year ago

ogolosovskiy commented 1 year ago

Describe the bug
There is problem with Highlighting feature
https://manual.manticoresearch.com/Searching/Highlighting#Highlighting-via-SQL

It does not work for expanded words (word*) Setting expand_keywords=star.

How to Reproduce

> SELECT  id, txt, weight(), HIGHLIGHT({before_match='[match]',after_match='[/match]',limit=4096},txt) FROM archiveogo_2 WHERE  username IN ('bbe54972-2981-42c2-b16b-e6963aaf0334@conference.ogo.devintermedia.net') and match('text') ORDER BY timestamp DESC LIMIT 10;    
 --------- ------------------------------ ---------- ---------------------------------------------------------------------------     
| id      | txt                          | weight() | highlight({before_match='[match]',after_match='[/match]',limit=4096},txt) |    
 --------- ------------------------------ ---------- ---------------------------------------------------------------------------     
| 4606902 | text texttext                |     1622 | [match]text[/match] texttext                                              |    
| 4606888 | text one text two text three |     1597 | [match]text[/match] one [match]text[/match] two [match]text[/match] three |    
| 4606887 | texttext text                |     1622 | texttext [match]text[/match]                                              |    
| 4606886 | texttext                     |     1560 | texttext                                                                  |    
| 4606885 | text texttext                |     1622 | [match]text[/match] texttext                                              |    
 --------- ------------------------------ ---------- ---------------------------------------------------------------------------     
5 rows in set (0.00 sec)    

Expected behaviour

 --------- ------------------------------ ---------- ---------------------------------------------------------------------------     
| id      | txt                          | weight() | highlight({before_match='[match]',after_match='[/match]',limit=4096},txt) |    
 --------- ------------------------------ ---------- ---------------------------------------------------------------------------     
| 4606902 | text texttext                |     1622 | [match]text[/match] [match]texttext[/match]                               |    
...    

Describe the environment:

Manticore 6.0.2 89c7a5139@230210 (columnar 2.0.0 a7c703d@230130) (secondary 2.0.0 a7c703d@230130)

OS version:
Debian GNU/Linux 11 (bullseye)

Messages from log files:
Logs without errors

My config
index archiveo ...
{
type = plain
charset_table = non_cjk, U 20BD, U 20AC, U 24
ngram_len=1
ngram_chars = cjk
blendchars = :, -, , .,
source = archiveogom_1
path = /var/lib/manticore/dataogo/archiveogo_1
min_prefix_len = 1
min_infix_len = 3
dict=keywords
expand_keywords=star
min_word_len=3
}

sanikolaev commented 1 year ago

MRE

mysql> drop table if exists t; create table t(f text) expand_keywords='star' min_infix_len='2'; insert into t values(1, 'abc defghi jkl'); select highlight() from t where match('efg');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create table t(f text) expand_keywords='star' min_infix_len='2'
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
insert into t values(1, 'abc defghi jkl')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select highlight() from t where match('efg')
--------------

+----------------+
| highlight()    |
+----------------+
| abc defghi jkl |
+----------------+
1 row in set (0.00 sec)