Closed GoogleCodeExporter closed 9 years ago
I've missed the idea of the issue, actually ;)
I suggest that there should be a helper method in ResultSet (.highlight() ?)
which
will make it easier to implement a highlighter for results of searches using
Djapian,
and will hide usage of internal attributes like _query_parser and _query_str at
the
same time.
A generic way of "highlighting" in terms of HTML/CSS should be refined either.
Original comment by esizi...@gmail.com
on 4 Aug 2009 at 4:37
There is already related ticket
http://code.google.com/p/djapian/issues/detail?id=54
Original comment by daevaorn
on 4 Aug 2009 at 6:34
Issue 54 has been merged into this issue.
Original comment by daevaorn
on 4 Aug 2009 at 6:34
I don't think that they are the same. I see 2 different tasks there, even if
they
could be used together (like highlighting in a snippet):
1. provide a snippet from the result hit where the snippet is a part of the indexed text
2. been able to highlight search terms in a text; there is no difference between
highlighting search terms in a snippet or in a whole text (which could be a
single
field or the whole document)
Original comment by esizi...@gmail.com
on 5 Aug 2009 at 8:03
Could any body post more readable snippet how to highlight the search terms?
Original comment by and...@polyakov.name
on 19 Aug 2009 at 11:17
Original comment by daevaorn
on 19 Sep 2009 at 9:17
I don't have a more readable snippet, but the following FAQ entry in Xapian's
wiki is
related: http://trac.xapian.org/wiki/FAQ/Snippets
In particular, the
http://code.google.com/p/xappy/source/browse/trunk/xappy/highlight.py code
linked to
from there ought to be very helpful in implementing this.
Original comment by boulton.rj@gmail.com
on 23 May 2010 at 8:45
I've commited a simple highlight implementation into the trunk. See
HighlightTest test-case in the tests/search.py for usage example.
Original comment by esizi...@gmail.com
on 9 Jun 2010 at 7:18
To review: r361, r362 and r364 introduces initial support of search results
highlighting.
Original comment by esizi...@gmail.com
on 21 Jun 2010 at 4:09
Ok. I think it is good enough for base highlighting capability. Closed!
Original comment by daevaorn
on 21 Jun 2010 at 9:04
i'm sorry , but the patch is obviously broken , becouse the match for applying
the tag in the highlighting function is made against a steemed query text,
which will be almost allways different from the word taken from the input text
. It works in rare, simple cases.
Original comment by ortegajo...@gmail.com
on 9 Nov 2010 at 2:30
The patch should work just fine. I used the same approach in my project.
The idea is that in highlight(self, text, tag="strong") we are going to check
if the __stemmed form__ of each word from the incoming "text" (HTML page or
other source of text) matches the __stemmed form__ of any term from the search
query (see get_parsed_query_terms(self)), then replace all occurrences of the
__original__ word with "<tag>word</tag>" in the "text".
Original comment by esizi...@gmail.com
on 9 Nov 2010 at 3:27
you are right, thats the idea, but thats not what the code does
results.py, line 113
113 if stem(word.lower()) in terms:
does not produce the same results than get_parsed_query_terms(word.lower()) and
thats where everything goes awful.
and, something else, and ill give u an example :
sometimes the text "24)Artículo" (very wrong formated text, should have a
space or something between the 24 and 'Artículo', but, thats not our problem)
does match the search of the word 'articulo' at the index, but cant be
highlited by the code in r364. A steemed form of that text would provide 2
diferent texts, 24 and 'Artículo' that should be checked against the steemed
query text so it can be properly highlited, in my proyect, i higlight all the
text, like <strong>24)Artículo</strong>... ( i know, not the best solution at
all ).
Original comment by ortegajo...@gmail.com
on 9 Nov 2010 at 5:08
you are right, thats the idea, but thats not what the code does
results.py, line 113
113 if stem(word.lower()) in terms:
does not produce the same results than get_parsed_query_terms(word.lower()) and
thats where everything goes awful.
and, something else, and ill give u an example :
sometimes the text "24)Artículo" (very wrong formated text, should have a
space or something between the 24 and 'Artículo', but, thats not our problem)
does match the search of the word 'articulo' at the index, but cant be
highlited by the code in r364. A steemed form of that text would provide 2
diferent texts, 24 and 'Artículo' that should be checked against the steemed
query text so it can be properly highlited, in my proyect, i higlight all the
text, like <strong>24)Artículo</strong>... ( i know, not the best solution at
all ).
Original comment by ortegajo...@gmail.com
on 9 Nov 2010 at 5:11
Well, if the stemmer has been defined for the Indexer then stem(term) should be
equivalent to the results of get_parsed_query_terms(...). The latter is also
expected to drop stop-words from the search query if the stopper has been
defined, but that's another story.
I would actually agree that this is just a very basic support which is expected
to be extended by final users until we have found a better approach to support
more search/highlight use-cases. Feel free to suggest ideas and contribute your
code ;)
Original comment by esizi...@gmail.com
on 9 Nov 2010 at 5:43
Original issue reported on code.google.com by
esizi...@gmail.com
on 4 Aug 2009 at 4:26