Closed GoogleCodeExporter closed 9 years ago
Sorry, in step (1) above, it should read:
1) pass HTML to boilerpipe's HTML highlighter
Original comment by sujitatg...@gmail.com
on 19 Jun 2011 at 9:28
Matching at term-level is out of scope for boilerpipe.
See Lucene's Highlighter class for a starting point:
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apac
he/lucene/search/highlight/Highlighter.html
Original comment by ckkohl79
on 21 Mar 2012 at 9:30
Hi, if this is not a complete reject... :-)
I am not asking for matching at term level for boilerpipe. I am asking for
character offsets (wrt the input text) in the non-boilerplate output returned
by boilerpipe (ie step 2 in the original post).
So assuming an input text:
# 0 1 2 3 4
5
# 012345678901234567890123456789012345678901234567890123456789
BOILERPLATEsome good textMORE BOILERPLATE....
the output of boilerpipe is:
some good text
I am asking for a way to say that "some good text" starts at position 10 and
ends at position 21 in the original text. The rest I can do in my application.
I ask because I think this is already being done by the boilerpipe highlighter,
so the information exists, but I couldn't figure out a way to get to it.
Thanks again,
Sujit
Original comment by sujitatg...@gmail.com
on 21 Mar 2012 at 11:09
Original issue reported on code.google.com by
sujitatg...@gmail.com
on 19 Jun 2011 at 9:25