wikimedia / search-highlighter

Github mirror of "search/highlighter" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing
100 stars 36 forks source link

Is there any way to highlight a phrase query spanning an array #10

Open jeacott opened 9 years ago

jeacott commented 9 years ago

This is probably a feature request :), If I have an indexed array, [{text:"one a",o:"1",d:"2"},{text:"two b",o:"2",d:"2"},{text:"three c",o:"3",d:"2"}] and a phrase query "one a two", highlighted using fetch_fields :["o","d"]

is there any way I can get highlighted results that keep the phrase in tact across array boundaries? usually this query will result in 2 fragment results, and not necessarily in document order, ie something like: ["two",2,2, "one a",1,2]

what I would like to get is some way to determine result groups that match the original query, perhaps as a sub array? [["one a",1,2, "two",2,2]]

suggestions for approaches also gratefully accepted

Cheers

nik9000 commented 9 years ago

is there any way I can get highlighted results that keep the phrase in tact across array boundaries?

Its certainly possible. Right now we force a boundary on the string boundaries. Its possible to not.

what I would like to get is some way to determine result groups that match the original query, perhaps as a sub array? [["one a",1,2, "two",2,2]]

Elasticsearch limits us to returning strings. I suppose you could return: [ "one a", 1, 2, "two", 2, 2 ]

Usually you want to not highlight phrases across those gaps. You set the position_offset_gap to something large like 1000 so phrases don't match across. But you don't want that.

jeacott commented 9 years ago

Elasticsearch limits us to returning strings. I suppose you could return: [ "one a", 1, 2, "two", 2, 2 ]

string results are just fine. its the boundary crossing I'm interested in. [ "one a", "1", "2", "two", "2", "2" ] would be great! if It would respect the highlighter fragmentSize in determining which elements to consider that would be ideal.