Closed GoogleCodeExporter closed 8 years ago
See the de.l3s.boilerpipe.sax.HTMLHighlighter class. It highlights the blocks
that are being kept.
If you modify it to only call "html.append" when "highlight == true" you should
get what you want.
Would be a nice contribution!
Marking as Type-Enhancement.
Christian
Original comment by ckkohl79
on 24 Jan 2010 at 4:51
[deleted comment]
Super. My Java is pretty bad, but I'll give this a go in the next few days.
Original comment by tom%tomt...@gtempaccount.com
on 2 Feb 2010 at 10:43
Hi, I have tried to use html.append only when highlight == true, but it doesn't
seam
to work as expected. HTMLHighligher is a nice starting point.
If I manage to write something that would do the job, I will send.
Original comment by vek...@gmail.com
on 12 Mar 2010 at 2:42
See the new HTMLHighlightDemo in 1.1.
Just comment-out the existing HTMLHighlighter (newHighlightingInstance) and
replace it by the other one below, i.e.
final HTMLHighlighter hh = HTMLHighlighter.newExtractingInstance();
This should give you a clean HTML representation of the extracted text. No
images, though.
Original comment by ckkohl79
on 2 Nov 2010 at 3:36
It would be useful to have the HTMLHighlighter to take raw HTML string as well,
i.e.,
public String process(final String html, final BoilerpipeExtractor extractor)
Original comment by neto.sur...@gmail.com
on 16 Dec 2011 at 10:29
Original issue reported on code.google.com by
tom%tomt...@gtempaccount.com
on 24 Jan 2010 at 4:36